FANTOM4, Functional Annotation of the Mammalian Genome using deepCAGE


The FANTOM consortium is an international collaborative research project initiated and organized by the RIKEN Omics Science Center. In earlier FANTOM efforts we cloned and annotated 103,000 full-length cDNAs from mouse and distributed them to researchers throughout the world. FANTOM1-3 focused on identifying the transcribed components of mammalian cells. This work improved estimates of the total number of genes and their alternative transcript isoforms in both human and mouse, expanded gene families, and revealed that a large fraction of the transcriptome is non-coding. In addition, with the development of Cap Analysis of Gene Expression (CAGE) FANTOM3 could map a large fraction of transcription start sites and revise the existing models of promoter structure. FANTOM4 provides the previous FANTOM results mapped to current genome builds. In FANTOM4 the focus has changed to understanding how these components work together in the context of a biological network. Using deepCAGE (deep sequencing with CAGE) the dynamics of transcription start site (TSS) usage during a time course of monocytic differentiation in the acute myeloid leukemia cell line THP-1 were monitored. This allowed the identification of active promoters, of their relative expression and the definition of relevant regions for carrying out transcription factor binding site predictions. Computational methods were then used to build a network model of gene expression in this leukemia and the transcription factors key to its regulation. This work gives the first picture of the wiring between genes involved in acute myeloid leukemia and provides a strategy for identifying key factors that determine cell fates. In addition to the network, FANTOM4 data was used in two additional analyses. The first identified a novel class of short RNAs associated with transcription start sites and the second focused on the role of repetitive element expression in the transcriptome.


CAGE tags, mapping, and promoters for human were downloaded from: FANTOM4
Input file format: Tab-delimitd TXT


From M. musculus (July 2007 NCBI37/mm9).

Filename Description Feature GEO-ID
1 fantom4.sga CAGE Mouse CAGE -

Technical Notes

The source format is a non-standard tab-delimited format that has been converted to SGA via an ad hoc perl scriptIt is recommended to use a tag count cut-off value of 99999 when using the ChIP-Seq analysis tools.


  1. Ravasi T, Suzuki H, Cannistraci CV, Katayama S et al.
    An atlas of combinatorial transcriptional regulation in mouse and man. Cell 2010 Mar 5;140(5):744-52. PMID: 20211142

Genome browser viewable files