a catalog of biological databases
|Description:||FANTOM1-3 focused on identifying the transcribed components of mammalian cells. FANTOM3 could map a large fraction of transcription start sites and revise models of promoter structure. In FANTOM4 the focus has changed to understanding how these components work together in the context of a biological network.|
|Address:||RIKEN Omics Science Center,RIKEN Yokohama Institute,1-7-22 Suehiro-cho,Tsurumi-ku|
|Contact name (PI/Team):||FANTOM help|
|Contact email (PI/Helpdesk):||email@example.com|
Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. [PMID: 30407557]
The FANTOM web resource (http://fantom.gsc.riken.jp/) was developed to provide easy access to the data produced by the FANTOM project. It contains the most complete and comprehensive sets of actively transcribed enhancers and promoters in the human and mouse genomes. We determined the transcription activities of these regulatory elements by CAGE (Cap Analysis of Gene Expression) for both steady and dynamic cellular states in all major and some rare cell types, consecutive stages of differentiation and responses to stimuli. We have expanded the resource by employing different assays, such as RNA-seq, short RNA-seq and a paired-end protocol for CAGE (CAGEscan), to provide new angles to study the transcriptome. That yielded additional atlases of long noncoding RNAs, miRNAs and their promoters. We have also expanded the CAGE analysis to cover rat, dog, chicken, and macaque species for a limited number of cell types. The CAGE data obtained from human and mouse were reprocessed to make them available on the latest genome assemblies. Here, we report the recent updates of both data and interfaces in the FANTOM web resource.
An integrated expression atlas of miRNAs and their promoters in human and mouse. [PMID: 28829439]
MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
Linking FANTOM5 CAGE peaks to annotations with CAGEscan. [PMID: 28972578]
The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.
The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types. [PMID: 28850107]
The latest project from the FANTOM consortium, an international collaborative effort initiated by RIKEN, generated atlases of transcriptomes, in particular promoters, transcribed enhancers, and long-noncoding RNAs, across a diverse set of mammalian cell types. Here, we introduce the FANTOM5 collection, bringing together data descriptors, articles and analyses of FANTOM5 data published across the Nature Research journals. Associated data are openly available for reuse by all.
The FANTOM5 Computation Ecosystem: Genomic Information Hub for Promoters and Active Enhancers. [PMID: 28451981]
The Functional Annotation of the Mammalian Genome 5 (FANTOM5) project conducted transcriptome analysis of various mammalian cell types and provided a comprehensive resource to understand transcriptome and transcriptional regulation in individual cellular states encoded in the genome.FANTOM5 used cap analysis of gene expression (CAGE) with single-molecule sequencing to map transcription start sites (TSS) and measured their expression in a diverse range of samples. The main results from FANTOM5 were published as a promoter-level mammalian expression atlas and an atlas of active enhancers across human cell types. The FANTOM5 dataset is composed of raw experimental data and the results of bioinformatics analyses. In this chapter, we give a detailed description of the content of the FANTOM5 dataset and elaborate on different computing applications developed to publish the data and enable reproducibility and discovery of new findings. We present use cases in which the FANTOM5 dataset has been reused, leading to new findings.
Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals. [PMID: 27794045]
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki. [PMID: 27402679]
The Functional Annotation of the Mammalian Genome project (FANTOM5) mapped transcription start sites (TSSs) and measured their activities in a diverse range of biological samples. The FANTOM5 project generated a large data set; including detailed information about the profiled samples, the uncovered TSSs at high base-pair resolution on the genome, their transcriptional initiation activities, and further information of transcriptional regulation. Data sets to explore transcriptome in individual cellular states encoded in the mammalian genomes have been enriched by a series of additional analysis, based on the raw experimental data, along with the progress of the research activities. To make the heterogeneous data set accessible and useful for investigators, we developed a web-based database called Semantic catalog of Samples, Transcription initiation And Regulators (SSTAR). SSTAR utilizes the open source wiki software MediaWiki along with the Semantic MediaWiki (SMW) extension, which provides flexibility to model, store, and display a series of data sets produced during the course of the FANTOM5 project. Our use of SMW demonstrates the utility of the framework for dissemination of large-scale analysis results. SSTAR is a case study in handling biological data generated from a large-scale research project in terms of maintenance and growth alongside research activities.Database URL: http://fantom.gsc.riken.jp/5/sstar/.
Paradigm shifts in genomics through the FANTOM projects. [PMID: 26253466]
Big leaps in science happen when scientists from different backgrounds interact. In the past 15 years, the FANTOM Consortium has brought together scientists from different fields to analyze and interpret genomic data produced with novel technologies, including mouse full-length cDNAs and, more recently, expression profiling at single-nucleotide resolution by cap-analysis gene expression. The FANTOM Consortium has provided the most comprehensive mouse cDNA collection for functional studies and extensive maps of the human and mouse transcriptome comprising promoters, enhancers, as well as the network of their regulatory interactions. More importantly, serendipitous observations of the FANTOM dataset led us to realize that the mammalian genome is pervasively transcribed, even from retrotransposon elements, which were previously considered junk DNA. The majority of products from the mammalian genome are long non-coding RNAs (lncRNAs), including sense-antisense, intergenic, and enhancer RNAs. While the biological function has been elucidated for some lncRNAs, more than 98 % of them remain without a known function. We argue that large-scale studies are urgently needed to address the functional role of lncRNAs.
Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. [PMID: 26117540]
Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE. Here, we report on the first large-scale complex tissue transcriptome comparison between full-length versus 5'-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two data sets showed high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, up to 75% showed consensus of 7-fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other data set, likely dependent on the cell type proportions included in each tissue sample. Our results show that RNA-Seq and CAGE tissue transcriptome data sets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.
Gateways to the FANTOM5 promoter level mammalian expression atlas. [PMID: 25723102]
The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
A promoter-level mammalian expression atlas. [PMID: 24670764]
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
An atlas of active enhancers across human cell types and tissues. [PMID: 24670763]
Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation.
An atlas of combinatorial transcriptional regulation in mouse and man. [PMID: 20211142]
Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.