Abstracts - Conference Management System - National Genomics Data Center


Number of public abstracts: 21 from all 26 submissions
Abstract ID Title & Abstract Presenter Conference
26RNA 5-methylcytosine facilitates the maternal-to-zygotic transition by preventing maternal mRNA decay
Ying Yang1, Lu Wang2, xiao Han1,3, Wenlan Yang1,3, Mengmeng Zhang4, Jinbiao Ma5, Feng Liu2, Yun-Gui Yang1
1 CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, China
2 State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, China
3 Sino-Danish College, University of Chinese Academy of Sciences, China
4 State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Multiscale Research Institute for Complex Systems, Department of Biochemistry, CAS Key Laboratory of Genomic and Precision Medicine, China
5 State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Multiscale Research Institute for Complex Systems, Department of Biochemistry, School of Life Sciences, Fudan University, China
The maternal-to-zygotic transition (MZT) is a conserved and fundamental process during which the maternal environment is converted to an environment of embryonic-driven development through dramatic reprogramming. In zebrafish, several factors have been indicated to be essential in regulating maternal mRNA decay, including the zygotically transcribed microRNA miR-430, suboptimal codon usage, N6-methyladenosine (m6A), and uridylation. However, the known factors mentioned above have been shown to accelerate the decay of only several hundreds of maternal mRNAs, suggesting that additional factors and pathways may exist to coordinately regulate the decay process of the majority of maternal mRNAs. 5-Methylcytosine (m5C) is an prevalent mRNA modification.It paticipates in RNA export from nuclear to cytoplasm and plays important roles in mouse embryonic stem cells, the mouse brain, and plant tissues. These findings provide reasonable evidence that m5C is probably involved in the complex regulation of mRNA metabolism and early developmental processes. In this study,we reveal a novel mechanism of how maternal mRNAs maintian stability during MZT
[Last update: 2019-09-20 22:44:52]
Wenlan YangThe 4th Big Data Forum for Life and Health Sciences
24m6A promotes R-loop formation to facilitate transcription termination
Xin Yang1, Qianlan Liu1, Wei Xu2, Yichang Zhang1, Jie Ren1, Qianwen Sun2, Yun-Gui Yang1
1 Beijing Institute of Genomics, Chinese Academy of Sciences
2 Tsinghua University
R-loops play critical roles in cellular processes and human diseases, hence the importance of controlling their formation and resolution. However, in addition to external factors, whether R-loops can be regulated by intrinsic modification of their RNA component and the function of this proposed modulation are unknown. Here, we showed that depletion of the m6A methyltransferase METTL3 dramatically reduces R-loop accumulation in numerous genes that harbor N6-methyladenosine (m6A) modification sites, specifically around transcription end sites (TESs). Moreover, reduced R-loops at TESs caused by m6A depletion significantly increase the readthrough activity of RNA polymerase II. Restoration of R-loops at affected TESs and suppression of consequent readthrough activities require the methyltransferase activity of METTL3. Therefore, our results revealed a novel role of m6A modification in promoting co-transcriptional R-loops for efficient termination.
[Last update: 2019-09-20 17:57:09]
Qianlan LiuThe 4th Big Data Forum for Life and Health Sciences
23Transcriptional Heterogeneity of Mouse Megakaryocytes at Single-Cell Resolution
Shu Sun1,2, Chen Jin1,2, Yueying Li1, Jia Si1,2, Yueli Cui3,4, Matthew T. Rondina5, Fuchou Tang3,4,6, Qianfei Wang1,2
1 Key Laboratory of Genomic and Precision Medicine, Collaborative Innovation Center of Genetics and Development, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
2 University of Chinese Academy of Sciences, Beijing, China
3 Beijing Advanced Innovation Center for Genomics, College of Life Sciences, Peking University, Beijing, China
4 Biomedical Institute for Pioneering Investigation via Convergence, Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, Beijing, China
5 Department of Internal Medicine and the Molecular Medicine Program, University of Utah, Salt Lake City, UT
6 Peking–Tsinghua Center for Life Sciences, Peking University, Beijing, China
Megakaryocytes (MKs) have long been described solely as platelet progenitors. However, recent studies show that MK also act as an essential component of the bone marrow niche to maintain hematopoietic stem cell function, and combat infection by engulfing and presenting antigens. However, it is not known whether these diverse programs are executed by a single cell population or distinct subsets of cells. We performed single-cell RNA sequencing (scRNA-seq) to dissect the heterogeneity of MKs. To overcome the difficulty in obtaining the rare and fragile MKs, we developed an efficient isolation strategy by combining fluorescence-activated cell sorting (FACS) sorting, manual selection of highly viable cells, and FISH verification of ploidy. We obtained 920 CD41+ highly-purified, bone-marrow derived, murine MKs spanning each ploidy stages (2N-32N) for scRNA-seq. All cells expressed classical MK markers such as Pf4, CD61 and Itga2b (CD41). Four cell clusters were identified using an unsupervised clustering method and each cluster was characterized by the DNA contents level, transcriptional factor network and cell population-specific markers. Cells in Cluster 1 were enriched for hemostasis and platelet activation expression signatures, and consist of ≥8N cells, suggesting these MKs may represent platelet generating MKs. Cells in cluster 2 were low ploidy (≤8N), and had higher expression of inflammation-related genes, including Ctss and Itgam (“inflammatory response-associated MKs”). Cells in Cluster 3 were enriched for DNA replication and DNA strand elongation (GO terms) and were in all ploidy stages (“MKs in polyploidization stage”). Cluster 4, most of which were high-ploidy, expressed high levels of TGF-β, and IGF1: factors regulating HSC behavior (“HSC niche cells”). We further investigated the potential intrinsic relationship of these four Clusters during megakaryopoiesis. Developmental time courses were reconstructed using Monocle analysis, demonstrating that polyploidization (Cluster 3) occurs at the early stage of MK development with subsequent differentiation toward three orientations. While MKs with low ploidy appear to have two distinct cell fates (immunomodulation or polyploidization), MKs with high ploidy (≥8N) differentiate towards populations associated with platelet production or stem cell regulation. Our study provides the first in vivo transcriptomic profile of megakaryopoiesis and a potential map of megakaryocyte heterogeneity at the single-cell resolution. Defining MK stages by ploidy and traditional markers CD42 and CD61 alone may result in a genetically and developmentally heterogenous population of MK. Rather, MKs at various stages may be more specifically identified by gene signatures. These observations suggest that megakaryopoiesis does not occur merely in a stepwise process, but is dynamic and adaptive to locations in the BM and biological needs.
[Last update: 2019-09-20 17:56:32]
Shu SunThe 4th Big Data Forum for Life and Health Sciences
22Multi-omics analysis of esophageal squamous cell carcinoma reveals alcohol drinking-related mutation signature and genomic alterations mediating interactions in tumor ecosystem
Chen Wu1, Yanyi Huang2, Jianbin Wang3
1 Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, China
2 Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, China
3 chool of Life Sciences, and Tsinghua-Peking Center for Life Sciences,, Tsinghua University, China
Esophageal squamous cell carcinoma (ESCC) is a common malignancy among Asia population with remarkably poor prognosis. The development of ESCC involves lifestyle risks and gene aberrations presented in the heterogeneous tumor ecosystem. Here, we present a comprehensive genomic landscape of ESCC through whole-exome (WES) and whole-genome sequencing (WGS) of DNA and RNA of 94 Chinese individuals with ESCC. We also established a comprehensive cell atlas for ESCC though single-cell RNA sequencing (scRNA-seq) of another cohort of 60 patients. Through WES and WGS analysis, we identify a signature unique to ESCC associated with alcohol intake and genetic variants in alcohol metabolizing enzymes. The genomic analysis also revealed recurrent mutations, including mutations in TP53 and NOTCH1. Integrating single-cell transcriptome profiling, we identified possible interaction mechanisms between lifestyle risks, gene alterations, and varying cellular states that may lead to ESCC development. And our findings will provide potential targets for precision treatment and prevention of the cancer.
[Last update: 2019-09-18 10:47:03]
Chen WuThe 4th Big Data Forum for Life and Health Sciences
21NucMap: a database of genome-wide nucleosome positioning map across species
Yongbing Zhao1, Jinyue Wang2
1 Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL 32224, USA, USA
2 BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China, China
Dynamics of nucleosome positioning affects chromatin state, transcription and all other biological processes occurring on genomic DNA. While MNase-Seq has been used to depict nucleosome positioning map in eukaryote in the past years, nucleosome positioning data is increasing dramatically. To facilitate the usage of published data across studies, we developed a database named nucleosome positioning map (NucMap, http://bigd.big.ac.cn/nucmap). NucMap includes 798 experimental data from 477 samples across 15 species. With a series of functional modules, users can search profile of nucleosome positioning at the promoter region of each gene across all samples and make enrichment analysis on nucleosome positioning data in all genomic regions. Nucleosome browser was built to visualize the profiles of nucleosome positioning. Users can also visualize multiple sources of omics data with the nucleosome browser and make side-by-side comparisons. All processed data in the database are freely available. NucMap is the first comprehensive nucleosome positioning platform and it will serve as an important resource to facilitate the understanding of chromatin regulation.
[Last update: 2019-09-20 22:44:00]
Jinyue WangThe 4th Big Data Forum for Life and Health Sciences
20new genes generated and amplified by transposons in animals
Shengjun Tan1
1 Institute of Zoology
Transposable elements are mobile genetic units ubiquitous in various organisms and a major force to influence the genomic architectures. A key contribution of TEs to evolution is their capacity to mediate the formation of new genes. According to their transposition mechanisms, TEs can be classified into two groups: retrotransposons (class I) and DNA transposons (class II). Our first work (Genome Research, 2016) shows that LTR retrotransposons can mediate the generation of retrogenes in animals. The recently originated retrocopies have a similar chimeric structure: the internal retrocopies are flanked by discontinuous LTR retrotransposons. At the fusion points we identified shared short similar sequences, suggesting the involvement of microsimilarity dependent template switches at RNA level. Our second work (unpublished) shows DNA transposons can also mediate the generation of functional genes in animals. We identified genes with the similar chimeric structure in various animals. Deep analysis on one gene in Drosophila melanogaster shows that it was successively amplified to multiple copies in populations. This gene is preserved by selection sweep and highly expressed in midgut, which suggests it acquires certain function. Combined with the HiC-seq data, we proposed a model involving template switches during DNA replication process to explain the formation of these new genes. Both mechanisms at RNA and DNA level are conserved across a wide range of animal taxa, which represents ancient and ongoing mechanisms mediated by transposons continuously shaping gene content evolution in animals.
[Last update: 2019-09-20 22:43:38]
Shengjun TanThe 4th Big Data Forum for Life and Health Sciences
18Genome Warehouse: A Public Repository Housing Genome-scale Data
Meili Chen1, Yingke Ma1, Song Wu1, Zhaohua Li1, Zheng Gong1, Yiming Bao1
1 BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, 中国
The Genome Warehouse (GWH; https://bigd.big.ac.cn/gwh) is a public repository housing genome-scale data for a wide range of species and delivering a series of web services for genome data submission, storage, release and sharing. It is one of the core databases in the National Genomics Data Center (NGDC; https://bigd.big.ac.cn/). Genome assemblies including sequences of whole genome, chloroplast, mitochondrion and plasmid are obtained from an on-line submission system and off-line batch submissions. For each genome assembly, GWH contains detailed genome-related information including species metadata, genome assembly, sequence data and the corresponding annotations. To archive high-quality genome sequences and genome annotations information, GWH adopts a uniform standardized procedure for quality control. Accession numbers are assigned to assemblies and sequences upon the pass of quality control. The released genome assembly and genome annotation data are available through FTP. GWH provides data visualization for genome sequence and genome annotation using JBrowse, and provides multiple statistics and charts for assembly level, genome representation, sequencing platform, assembly method, organization and download. GWH has accepted 655 direct submissions from both domestic and international institutions and covered a broad diversity of species, viz., animals, plants, fungi, bacteria, archaea, viruses and others. Among all direct submissions, 136 genome assemblies have been publicly released. Data submitted to GWH have been reported by research articles in 19 different international journals so far. Clearly, GWH has a rapid growth in data submission and thus bears the great promise to serve as an important resource for genome related studies.
[Last update: 2019-09-20 21:56:06]
Meili ChenThe 4th Big Data Forum for Life and Health Sciences
17Molecular characterization reveals the diagnostic, prognostic and predictive significance of PRKCG in glioma
Lin Liu1, Guangyu Wang2, Liguo Wang3, Chunlei Yu1, Mengwei Li1, Shuhui Song1, Lili Hao1, Lina Ma1, Zhang Zhang1
1 National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, China
2 The Methodist Hospital Research Institute, United States
3 Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
To explore effective glioma biomarkers that have high specificity, enable non-invasive detection and possess clinical significance for glioma diagnosis, prognosis and prediction, we collect large-scale, multi-omics and multi-cohort datasets from public resources and perform integrative analyses. We identify PRKCG a brain-specific gene which is also abundant in the cerebrospinal fluid, PRKCG achieves higher specific and feasible detectability in periphery and bears potential in clinical application for glioma diagnosis, prognosis and treatment prediction. These results have been consistently revealed on a comprehensive assemble of glioma datasets, consisting of multi-omics molecular profiles and different ethnic populations/countries. Based on characterization of multi-omics molecular profiles, our findings clearly suggest the reliability and potential application of PRKCG as a biomarker for glioma.
[Last update: 2019-09-20 22:43:00]
Lin LiuThe 4th Big Data Forum for Life and Health Sciences
16Cross-sectional whole-genome sequencing and epidemiological study of multidrug-resistant Mycobacterium tuberculosis in China
Fei Chen1, Hairong Huang2, Nan Ding1, Tingtin Yang1, Cuidan Li1, Xinmiao Jia3
1 CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, China
2 National Clinical Laboratory on Tuberculosis, Beijing Key Laboratory on Drug-resistant Tuberculosis Research, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Institute, China
3 Department of Medical Research Center, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, China
Background. The increase in MDR-TB severely hampers its prevention and control in China, a country with the second highest MDR-TB burden globally. The first nationwide drug-resistant TB surveillance program provides an opportunity to comprehensively investigate the epidemiological/drug-resistance characteristics, potential drug-resistance mutations, and effective population changes of Chinese MDR-TB. Methods. We sequenced 357 MDR strains from 4,600 representative TB-positive sputum samples collected from the survey (70 counties in 31 provinces). Drug-susceptibility testing was performed using 18 anti-tuberculosis drugs, representing the most comprehensive drug-resistance profile to date. We employed three statistical methods and one machine learning method to identify drug-resistance genes/SNPs. Bayesian skyline analysis investigated the changes in effective population size. Results. Epidemiological/drug-resistance characteristics showed different multidrug-resistance profiles, co-resistance patterns, preferred drug combination/use, and recommended regimens among seven Chinese administrative regions. These not only reflected the serious multidrug co-resistance and drug misuse in some regions but these factors were also potentially significant in facilitating the development of appropriate regimens for MDR-TB treatment in China. Further investigation identified 86 drug-resistance genes/intergenic regions/SNPs (58 new), providing potential targets for diagnosis and treatment of MDR-TB. In addition, the effective population of Chinese MDR-TB displayed a strong expansion during 1993-2000, reflecting socioeconomic transition within the country. The phenomenon of expansion was restrained after 2000, likely attributable to the advances in diagnosis/treatment technologies and government support. Conclusions. Our findings provide an important reference and improved understanding of MDR-TB in China, potentially significant in achieving the goal of precision medicine with respect to MDR-TB prevention and treatment.
[Last update: 2019-09-20 17:54:59]
Cuidan LiThe 4th Big Data Forum for Life and Health Sciences
15Role of RNA N6-methyladenosine methylation in regulating brain development in mouse
Xufei Teng1,2, Shuhui Song1,2
1 National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, China
2 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, China
N6-methyladenosine (m6A) is the most abundant epitranscriptomic mark with high abundance in the mammalian brain. Recently, it has been found to be involved in the regulation of memory formation and mammalian cortical neurogenesis, but its potential functions in the brain remain largely unexplored. We performed a transcriptome-wide methylation analysis using the mouse brain to depict its methylation profile and further uncover distinct features of continuous and spatiotemporal-specific m6A methylation across the developmental processes in cerebellum and cortex, including methylation level and distribution patterns along with the mRNA transcripts. Notably, nuclear export of the hypermethylated RNAs is enhanced in the cerebellum of Alkbh5-deficient mice exposed to hypobaric hypoxia. m6A also functions in the cerebral cortex temporally and participate in RNA metabolism via RNA decay mediate by RNA binding proteins. Our results imply that RNA m6A methylation is a newly identified element in the regulation of the mouse brain development.
[Last update: 2019-09-20 22:42:44]
Xufei TengThe 4th Big Data Forum for Life and Health Sciences
14Comparison and analysis of lncRNA-mediated ceRNA regulation in different molecular subtypes of glioblastoma
Qianpeng Li1,2, Qiuhong Yu3, Jianghuai Ji2, Peng Wang4, Dongguo Li2
1 National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences(present), China
2 School of Biomedical Engineering, Capital Medical University, China
3 Department of Hyperbaric Oxygen,Beijing Tiantan Hospital, Capital Medical University, China
4 College of Bioinformatics Science and Technology, Harbin Medical University, China
Glioblastoma multiforme (GBM) is the most malignant brain tumor with a poor prognosis. A molecular level classification of GBM can provide insight into accurate patient-specific treatment. Competitive endogenous RNAs (ceRNAs) such as long non-coding RNAs (lncRNAs) play an essential role in the development of tumors and are associated with survival. However, the pattern of lncRNA-mediated ceRNA (LMce) crosstalk in different GBM subtypes is unclear yet. In this study, we present a computational cascade to construct LMce networks of different GBM subtypes and investigate the lncRNA-mRNA regulations among them. Our results showed that although most lncRNAs and mRNAs in the different GBM subtype networks were same, the regulation relationships of these RNAs were different among subtypes. 42.5%, 50.9 %, 43.5% and 65.0% lncRNA-mRNA regulatory pairs were Classic (CL)-, Mesenchymal (MES)-, Proneural (PN)- and Neural (NE)-specific. In addition, our study identified 61, 132, 24 and 16 modules in which lncRNAs and mRNAs synergically competed with each other for miRNAs as CL-, MES-, PN- and NE-specific. CL- and MES-specific modules were mainly involved in the biological functions such as cell proliferation, apoptosis and migration, while PN- and NE- specific modules were mainly related to DNA damage and cell cycle dysregulation. Survival analysis demonstrated that some modules could be potential prognostic markers of patients of CL and MES subtype. This study uncovered the LMce interaction patterns in different GBM subtypes, identified subtype-specific modules with distinct biological functions, and revealed the potential prognostic markers of patients of different GBM subtype. These results might contribute to the discovery of the GBM prognostic biomarkers and development of a more accurate therapeutic process.
[Last update: 2019-09-20 22:42:30]
Qianpeng LiThe 4th Big Data Forum for Life and Health Sciences
13Plant editosome database: a curated database of RNA editosome in plants
Man Li1,2, Lin Xia1,2, Yuansheng Zhang1,2, Lili Hao1,2, Zhang Zhang1,3
1 National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, China
2 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, China
3 National Genomics Data Center, China
RNA editing plays an important role in plant development and growth, enlisting a number of editing factors in the editing process and accordingly revealing the diversity of plant editosomes for RNA editing. However, there is no resource available thus far that integrates editosome data for a variety of plants. Here, we present Plant Editosome Database (PED; http://bigd.big.ac.cn/ped), a curated database of RNA editosome in plants that is dedicated to the curation, integration and standardization of plant editosome data. Unlike extant relevant databases, PED incorporates high-quality editosome data manually curated from related publications and organelle genome annotations. In the current version, PED integrates a complete collection of 98 RNA editing factors and 20 836 RNA editing events, covering 203 organelle genes and 1621 associated species. In addition, it contains functional effects of editing factors in regulating plant phenotypes and includes detailed experimental evidence. Together, PED serves as an important resource to help researchers investigate the RNA editing process across a wide range of plants and thus would be of broad utility for the global plant research community.
[Last update: 2019-09-20 22:46:38]
Yuansheng ZhangThe 4th Big Data Forum for Life and Health Sciences
12Dynamic methylome of internal mRNA N7-methylguanosine and its regulatory role in translation
Lionel MALBEC1, Ting Zhang1, Yu-Sheng Chen1, Ying Zhang2, Bao-Fa Sun1, Bo-Yang Shi1, Yong-Liang Zhao1, Ying Yang1, Yun-Gui Yang1
1 Beijing Institute of Genomics, Chinese Academy of Sciences
2 Institute of Zoology, Chinese Academy of Sciences
Over 150 types of RNA modifications are identified in RNA molecules. Transcriptome profiling is one of the key steps in decoding the epitranscriptomic panorama of these chemical modifications and their potential functions. N7-methylguanosine (m7G) is one of the most abundant modifications present in tRNA, rRNA and mRNA 5′cap, and has critical roles in regulating RNA processing, metabolism and function. Besides its presence at the cap position in mRNAs, m7G is also identified in internal mRNA regions. However, its transcriptome-wide distribution and dynamic regulation within internal mRNA regions remain unknown. Here, we have established m7G individual-nucleotide-resolution cross-linking and immunoprecipitation with sequencing (m7G miCLIP-seq) to specifically detect internal mRNA m7G modification. Using this approach, we revealed that m7G is enriched at the 5′UTR region and AG-rich contexts, a feature that is well-conserved across different human/mouse cell lines and mouse tissues. Strikingly, the internal m7G modification is dynamically regulated under both H2O2 and heat shock treatments, with remarkable accumulations in the CDS and 3′UTR regions, and functions in promoting mRNA translation efficiency. Consistently, a PCNA 3′UTR minigene reporter harboring the native m7G modification site displays both enriched m7G modification and increased mRNA translation upon H2O2 treatment compared to the m7G site-mutated minigene reporter (G to A). Taken together, our findings unravel the dynamic profiles of internal mRNA m7G methylome and highlight m7G as a novel epitranscriptomic marker with regulatory roles in translation.
[Last update: 2019-09-20 22:42:12]
Lionel MALBECThe 4th Big Data Forum for Life and Health Sciences
11Computational analysis and visualization tools for 3D genomic study
Juntao Gao1, Yisi Li1, HongPeng Ma1, Songyan Hu1, Michael Q. Zhang1,2
1 Tsinghua University, 中国
2 Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, USA
Murine endogenous retrovirus-like element (MuERV-L/MERVL), a type of retrotransposon, whose expression is usually restricted to 2-cell (2C) stage in mouse pre-implantation embryo, is closely related to mouse zygotic genome activation (ZGA). Though there are more than 100 different MERVL subfamilies, the sequence diversity of millions of MERVL elements and their roles in ZGA have never been investigated in previous studies. Here we at first identified a subset of MERVL subfamilies, 2C-MERVL elements, from all 113 MERVL subfamilies. 2C-MERVL elements significantly enrich ZGA-specific genes, because of sequence characteristics and co-evolution with host. Next, multi-omics data related to 2C-MERVL elements were investigated. Surprisingly, we found that (1) 2C-MERVL elements were enriched around the boundaries of TADs (Topologically Associated Domains), (2) Comparing with other MERVL elements, 2C-MERVL elements were more activated and less repressed by bivalent histone modifications during ZGA transition. Therefore, the unique regulation ability of 2C-MERVL was linked with bivalent histone modification activation and dynamic 3D genomic interaction. Finally, 2C-MERVL elements were used as gene-level marks to uncover a group of rapidly-regulated 2C genes, which were potentially to be the precursors of major ZGA gene activation. Last but not the least, as computational/visualization method development is one of the important strategies to investigate three-dimensional (3D) genome organization in prokaryotic and eukaryotic cells, we introduce here another two visualization tools developed in our group recently. One is 3D modeling program––Web3DMol, published in NAR, ––a web application focusing on protein structure visualization in modern web browsers. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. The other is HiC-3DViewer, a browser-based interactive tool designed to provide an intuitive environment for investigators to facilitate the 3D exploratory analysis of Hi-C data along with many useful annotation functionalities. Among the key features of HiC-3DViewer relevant to chromatin conformation studies, the most important one is the 1D-to-2D-to-3D mapping, to highlight genomic regions of interest interactively. This feature enables investigators to explore their data at different levels/angels. As a user-friendly tool, HiC-3DViewer enables the visualization of inter/intra-chromatin interactions and gives users the flexibility to customize the look-and-feel of the 3D structure with a simple click.
[Last update: 2019-09-20 15:46:36]
Juntao GaoThe 4th Big Data Forum for Life and Health Sciences
9CGVD: A genomic variation database for Chinese populations
Jingyao Zeng1, Na Yuan1, Zhenglin Du1, Jingfa Xiao1
1 Beijing Institute of Genomics, Chinese Academy of Sciences
Precision medicine calls upon deeper coverage of population-based sequencing and thorough gene-content and phenotype-based analysis, which lead to a population-associated genomic variation map or database. The Chinese Genomic Variation Database (CGVD; https://bigd.big.ac.cn/cgvd/) is such a database that has combined 48.30 million (M) SNVs and 5.77 M small indels, identified from 991 Chinese individuals of the Chinese Academy of Science Precision Medicine Initiative Project (CASPMI) and 301 Chinese individuals of the 1000 Genomes Project (1KGP). The CASPMI project includes whole-genome sequencing data (WGS, 25–30X) from ~1000 healthy individuals of the CASPMI cohort. To facilitate the usage of such variations for pharmacogenomics and cancer studies, star-allele frequencies of the drug-related genes in the CASPMI and 1KGP populations are calculated, and the search module by cancer-related genes and cancer types is also built in CGVD. As one of the important database resources in BIG Data Center, CGVD will continue to collect more genomic variations and to curate structural and functional annotations to support population-based healthcare projects and studies in China and worldwide.
[Last update: 2019-09-20 17:55:39]
Jingyao ZengThe 4th Big Data Forum for Life and Health Sciences
8Database Resources of the National Genomics Data Center in 2020
Zhang Zhang1
1 National Genomics Data Center, China
The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
[Last update: 2019-09-20 22:45:54]
NGDC Members & PartnersThe 4th Big Data Forum for Life and Health Sciences
7An expanded landscape of human long noncoding RNA
Shuai Jiang1, Si-jin Cheng1, Ge Gao1
1 PeKing University
Long noncoding RNAs (lncRNAs) are defined as non-coding transcripts longer than 200 nt, and are emerging as key regulators of multiple essential biological processes involved in physiology and pathology. A high-quality and comprehensive lncRNA annotation is a cornerstone requirement of subsequent functional investigation. However, while tremendous efforts have been devoted to systematically characterizing lncRNAs in the human genome in recent years, large discrepancies still exist in the current major annotations. By analyzing the largest compendium of 14,166 samples across 30 normal tissues, two cell lines and 18 tumors, we significantly expand the landscape of human long noncoding RNA with a high-quality atlas: RefLnc (Reference catalog of LncRNA). RefLnc annotates 77,900 human lncRNAs, in which 35.3% (27,520/77,900) are novel over major reference catalogs. Moreover, 88.5% of the novel lncRNAs are successfully verified in the independent datasets. Among the 93 selected cases with unique primer pairs, 91.4% of novel intergenic lncRNAs are successfully validated by quantitative RT-PCR (qRT-PCR) and Sanger sequencing, including 52 multi-exon and 33 single-exon transcripts. Powered by comprehensive annotation across multiple sources, RefLnc helps to pinpoint 275 novel intergenic lncRNAs correlated with sex, age or race as well as 369 novel ones associated with patient survival, clinical stage, tumor metastasis or recurrence. Integrated in a user-friendly online portal (http://reflnc.gao-lab.org/), the expanded catalog of human lncRNAs provides a valuable resource for investigating lncRNA function in both human biology and cancer development.
[Last update: 2019-09-20 22:29:23]
Shuai JiangThe 4th Big Data Forum for Life and Health Sciences
6Genomic Variants Information and Knowledge Database
Shuhui Song1, Dongmei Tian2, Cuiping Li2, Pei Wang2, Xufei Teng2, bixia tang2
1 National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences
2 Beijing Institute of Genomics, Chinese Academy of Sciences
With the rapid development of high-throughput sequencing technologies, biological sequence data have been generated exponentially over the past decade. The availability of high-quality reference genome sequences and the improvement of genome variation data analysis methodology enable large-scale identification of genome variations at unprecedented rates, making it possible to systematically conduct population evolution studies and decipher genotype-to-phenotype (G2P) associations. We develop the Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm), a public database of genome variations, including single nucleotide polymorphisms (SNPs) and small insertions and deletions (Indels), which aims to collect, integrate and visualize genome variations for a wide range of species and accepts submissions of different types of genome variations from all over the world. The current release of GVM accepts 24 genome variation dataset submissions involving 23056 samples from 10 species, and houses a total of ~8.4 billion variants (including 7.2 billion SNPs and 1.2 billion Indels) for 32 species, and 78861 significant (P<10-3) genotype-to-phenotype (G2P) associations for 13 non-human species through literature curation. We further performed traits annotation by sematic mapping to a suite of ontologies (Plant Trait Ontology, Animal Trait Ontology for Livestock, etc.), and organized the G2P associations in GWAS Atlas (https://bigd.big.ac.cn/gwas/). Now, the GWAS Atlas has integrated 75,467 variant-trait associations for 614 traits across 7 cultivated plants (cotton, Japanese apricot, maize, rapeseed, rice, sorghum and soybean) and 2 domesticated animals (goat and pig), and presented them in terms of variants, genes, traits, studies, and publications. Taken together, the GVM and GWAS Atlas provides user-friendly web interfaces for data browsing and downloading, serves as an important resource for archiving genomic variation data and knowledge, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes.
[Last update: 2019-09-20 15:46:00]
Shuhui SongThe 4th Big Data Forum for Life and Health Sciences
5Multi-omics data integration for single-cell GRN inference
Ming Shi1
1 Department of Automation, Tsinghua University, China
Gene regulatory network inference is of great importance for single cell type identification. Current single-cell GRN inference methods focus on RNA-seq data analysis, subjecting to the dropouts and technical variation induced by transcriptional bursting. In this study, we propose to linking multi-omics data, such as single-cell epigenomics data, CRISPR-seq data to overcome the dropouts and technical variation in single-cell RNA-seq data and thus to optimize the discovery and characterization of cell states. In the proposed, the epigenomics data as well as CRISPR-seq are provided as prior information and then integrated with the single-cell RNA-seq data within a sparse regularized regression framework to inference TF-gene interactions. The activity of different modules in the infered GRN are identified and will be utilized as features in the cell-type identification. Instead of using the expression of individual genes, the modules will be robust against dropouts, which is to be verified in simulated and real compendium of single-cell data.
[Last update: 2019-09-20 22:45:44]
Ming ShiThe 4th Big Data Forum for Life and Health Sciences
4eRNA interact with target genes via basepairing In Alu Elements
Bai Xue1
1 Beijing Institute of Genomics, Chinese Academy of Sciences
Regulatory element enhancers can form loop structures with target genes in spatially and remotely and then regulate their expression. Long non-coding RNA generated by active enhancer transcription is called enhancer RNA (eRNA), which is involved in the interaction of E-P loops, but the regulation mechanism of eRNA is still unclear. Here, we propose that part of eRNAs bind to the target genes in the form of base complementary pairing and assist enhancers to regulate genes expression. We found that non-random matched sequence, Alu elements, co-existed on some eRNAs and target promoters (Tpromoters), and the matched sequences were positively correlated with the binding strength of transcription factors and the expression level of target genes. Compared with the benign SNPs, the GWAS sites that had influence on the body phenotype were significantly enriched on the promoters Alu elements. Moreover, there were significant more co-evolutionary sites in the orthologous eRNA-Tpromoter’s matching sequence Alu between human and chimpanzee than random eRNA-Tpromoter’s pairs. Thus, we identified that some eRNA may bind to target genes through non-random sequence Alu, and suggested that the interaction mode of base complementary pairing is biologically functional.
[Last update: 2019-09-20 22:41:53]
Bai XueThe 4th Big Data Forum for Life and Health Sciences
1co-expressed gene-set enrichment analysis for drug repositioning with examples of psoriasis and periodontal diseases
Zhilong Jia1, Wenyan Kang2, Qiang Feng2, Kunlun He1, Zhigang Luo3, Michael R. Barnes4
1 Chinese PLA General Hospital
2 Shandong University
3 National University of Defense Technology
4 Queen Mary University of London
Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the “black box” nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate the functions of cogena in psoriasis and periodontal diseases respectively. In the cogena example, we computationally recover two widely used Psoriasis drugs with distinct modes of action and showed other top-ranked candidate compounds with the literature support. In the periodontal diseases example, using a time-course transcriptomic data, we computationally and in vitro experimentally identified several drugs, that could protect the F. nucleatum infected gingival fibroblasts via the coexpression-based drug repositioning approach. In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and repositioning, allowing the grouping and prioritization of drug repositioning candidates on the basis of putative mode of action. All these studies are reproducible research with code available at GitHub.
[Last update: 2019-09-20 17:48:05]
Zhilong JiaThe 4th Big Data Forum for Life and Health Sciences
Last update: 16 Sep 2019 by zz (version 0.1)