a catalog of biological databases
|Description:||We de novo assembled a high-quality, reference macaque genome with a combination of the second- and third-generation sequencing; we reported an atlas of de novo-defined, full-length reference macaque gene models, on the basis of RNA-seq and the third-generation transcriptome sequencing; we profiled the first genome-wide macaque mutational map to facilitate monkey population genetics studies, on the basis of genome-wide sequencing efforts in 31 macaque animals; we also performed comprehensive functional genomics analyses on multiple regulatory levels to study the mechanism and evolution of fundamental regulations, which also substantially expands the macaque functional annotations. These efforts have culminated in the development of the RhesusBase, with >1,800 standardized functional genomics datasets integrated, 7.6 billion functional annotation records deposited, and more than 20 sets of user-friendly genomic interfaces developed to provide an information-rich framework for monkey genomics and translational study.|
|Address:||Institute of Molecular Medicine, Peking University, Beijing, China|
|Contact name (PI/Team):||Chuan-Yun Li|
|Contact email (PI/Helpdesk):||email@example.com|
Isoform Evolution in Primates through Independent Combination of Alternative RNA Processing Events. [PMID: 28957512]
Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875?bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates.
RhesusBase PopGateway: Genome-Wide Population Genetics Atlas in Rhesus Macaque. [PMID: 26882984]
Although population genetics studies have significantly accelerated the evolutionary and functional interrogations of genes and regulations, limited polymorphism data are available for rhesus macaque, the model animal closely related to human. Here, we report the first genome-wide effort to identify and visualize the population genetics profile in rhesus macaque. On the basis of the whole-genome sequencing of 31 independent macaque animals, we profiled a comprehensive polymorphism map with 46,146,548 sites. The allele frequency for each polymorphism site, the haplotype structure, as well as multiple population genetics parameters were then calculated on a genome-wide scale. We further developed a specific interface, the RhesusBase PopGateway, to facilitate the visualization of these annotations, and highlighted the applications of this highly integrative platform in clarifying the selection signatures of genes and regulations in the context of the primate evolution. Overall, the updated RhesusBase provides a comprehensive monkey population genetics framework for in-depth evolutionary studies of human biology.
Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque. [PMID: 24577841]
With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology.
RhesusBase: a knowledgebase for the monkey research community. [PMID: 22965133]
Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community (http://www.rhesusbase.org). We performed strand-specific RNA-Seq studies in 10 macaque tissues and generated 1.2 billion 90-bp paired-end reads, covering >97.4% of the putative exon in macaque transcripts annotated by Ensembl. We found that at least 28.7% of the macaque transcripts were previously mis-annotated, mainly due to incorrect exon-intron boundaries, incomplete untranslated regions (UTRs) and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon-intron distribution for expression tags and cross-species conservation scores. Strikingly, 1292 exon-intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly selected cases. We further integrated functional genomics annotations from >60 categories of public and in-house resources and developed an online accessible database. User-friendly interfaces were developed to update, retrieve, visualize and download the RhesusBase meta-data, providing a 'one-stop' resource for the monkey research community.