The 3rd Big Data Forum for Life and Health Sciences (October 11-14, 2018)

Biological research has entered the era of big data, including a wide variety of omics data and covering a broad range of health data. Such big data is generated at ever-growing rates and distributed throughout the world with heterogeneous standards and diverse limited access capabilities. However, the promise to translate these big data into big knowledge can be realized only if they are publicly shared. Thus, providing open access to omics & health big data is essential for expedited translation of big data into big knowledge and is becoming increasingly vital in advancing scientific research and promoting human healthcare and precise medical treatment.

Open Biodiversity & Health Big Data

It is our great pleasure to announce that the 2018 Big Data Forum for Life and Health Sciences will be held in October 11-14, 2018. A few renowned biomedical data scientists have agreed to give speeches. Likely, you are also cordially invited to share your work and participate in this excited event.

Looking forward to seeing you in Beijing, China! We will be working hard to ensure your stay not only a fruitful one, but also an enjoyable one!

Organizing Committee

  • Yiming Bao (BIG, CAS)
  • Zhang Zhang (BIG, CAS)
  • Wenming Zhao (BIG, CAS)
  • Jingfa Xiao (BIG, CAS)
  • Songnian Hu (BIG, CAS)
  • Jun Yu (BIG, CAS)
  • Jingchu Luo (Peking University)

Previous Conferences

Invited Speakers

Amir Abbasi

National Centre for Bio Informatics
Quaid-i-Azam University

Yiming Bao

BIG Data Center, Beijing Institute of Genomics
Chinese Academy of Sciences

Suhua Chang

Associate Professor
National Clinical Research Center for Mental Disorders
Institute of Mental Health, Peking University

Kaifu Chen

Associate Professor
Center Director for Bioinformatics and Computational Biology
Houston Methodist, Weill Cornell Medical College

Luonan Chen

Shanghai Institute of Biochemistry and Cell Biology
Chinese Academy of Sciences

Frank Eisenhaber

Executive Director
Bioinformatics Institute

Michael Y. Galperin

Lead Scientist
Computational Biology Branch
National Center for Biotechnology Information

Songnian Hu

Director of Key Laboratory of Genome Sciences & Information
Chinese Academy of Sciences

Zhiyuan Hu

National Center for Nanoscience and Technology
Chinese Academy of Sciences

Yuxia Jiao

Editor of Genomics, Proteomics & Bioinformatics
Beijing Institute of Genomics, CAS

Cheng Li

School of Life Sciences
Peking University

Xia Li

College of Bioinformatics Science and Technology
Harbin Medical University

Guoqing Lu

Isaacson Professor, Genomics and Bioinformatics
Department of Biology & School of Interdisciplinary Informatics
University of Nebraska at Omaha

Hui Lv

School of Life Sciences and Biotechnology
Shanghai Jiao Tong University

Vsevolod J. Makeev

Dept. Computational Systems Biology
Vavilov Institute of General Genetics, RAS

Suchinda Malaivijitnond

Director, National Primate Research Center of Thailand
Chulalongkorn University

Daniel Stekhoven

Clinical Bioinformatics Unit
ETH Zurich

Zhixi Su

School of Life Sciences
Fudan University

Jeffrey Townsend

Elihu Professor of Biostatistics and Ecology & Evolutionary Biology
Director of Bioinformatics, Yale Center for Analytical Sciences
Yale University

Qianfei Wang

Associate Director of Key Laboratory of Genomic and Precision Medicine
Beijing Institute of Genomics, CAS

Xiangfeng Wang

Director of Department of Crop Genomics and Bioinformatics
College of Agronomy and Biotechnology, China Agricultural University

Xiyin Wang

College of Life Sciences
North China University of Science and Technology

Changqing Zeng

CAS Key Laboratory of Genomic and Precision Medicine
Beijing Institute of Genomics, CAS

Xiaojun Zhang

Experimental Marine Biological Laboratory
Institute of Oceanology, CAS

Yongzhen Zhang

National Institute for Communicable Disease Control and Prevention
Chinese center for disease control and prevention

Hongkun Zheng

Biomarker Technologies

Qing Zhou

Life Sciences Institute
Zhejiang University

Agenda (To be updated)

October 11: Pick-up & Registration
October 12: Talks
09:00 - 10:10 Session 1, chaired by Yiming Bao, BIG, CAS
09:00 - 09:10 Welcome and Opening Remarks
Yongbiao Xue, Professor, Director of BIG, CAS
09:10 - 09:40 Phase 1 of CASPMI Project and Data Analysis [Abstract]
Changqing Zeng, BIG, CAS
Launched by the Chinese Academy of Sciences (CAS) in 2016, the Phase 1 of the CAS Precision Medicine Initiative Project (CASPMI) aims at (i) next generation sequencing of the whole genome (25-30X) for 1000 samples collected in CAS cohort; (ii) construction of a reference genome from a northern Han individual (NH1.0) using a hybrid approach including PacBio sequencing, 10X Genomics library preparation, and Bionano optical mapping; (iii) construction of electronic health records and genetic reports for CASPMI participants; (iv) association analyses based on sequencing data and phenotypes obtained from base line collection of the project. I will introduce and summarize the current results of CASPMI project. In brief, near the completion of the phase 1, we are able to provide a comprehensive genetic variation map including 24.85M SNPs, 3.85M small indels and 106,382 structural variations. In total, we identified population-specific variations of 55,271 SNPs and 6,774 indels in this cohort study, among which 42 significant SNPs in 39 genes are detected to present a significant correlation with various metabolic related traits and diseases based on GWAS-Catalog annotation. Geographic differentiation of northern and southern populations was observed, as well as the mutational signatures of novel variants showed difference in these two groups. Variations in MTHFR, TCN2, FADS1, and FADS2, which are associated with circulating folate and vitamin B12 or lipid metabolism, suggest the selection from various environmental exposures and life styles especially dieting between northerners and southerners. The high-quality human genome assembly and a comprehensive genetic map will provide population-specific genetic variations for later studies of precision medicine and individualized healthcare.
09:40 - 10:10 Human Brain Evolution In Context of Enhancer Divergence [Abstract]
Amir Abbasi, Quaid-i-Azam University, Pakistan
Humans are usually considered to be far the most intelligent than others animals. Factors that make up the basis of brain properties include size of the brain, cortex, prefrontal cortex and degree of encephalization. Now the question is how we can interpret the phenomenal complexity of Human Brain? The sophistication of vertebrate Brain is orchestrated through the signalling cascade of cis-regulatory modules, so to decipher this signal co-ordination is obligatory to comprehend prototyping of brain. In this talk we will be focusing on the acceleration in non-coding regulatory landscape of the genome and we will highlight the functional parts within it to have undergone accelerated divergence in present-day Human population. Moving ahead, we will focus on the transcription factors which are occupying the H. sapiens-unique binding sites such as SOX2 and RUNX1/3 and also play their part in maintaining a vital role in gene expression especially in the context of neural development. The second objective of this talk will be to define the forebrain specific transcriptional code through which we predicted the 25000 Human forebrain specific enhancers on the basis of heterotypic clustering of the core transcription factors shortlisted through the code. These enhancers are now being validated through different strategies followed by their functional testing in Zebrafish.
10:10 - 10:40 Group Photo and Tea & Coffee Break
10:40 - 12:10 Session 2, chaired by Jingfa Xiao, BIG, CAS
10:40 - 11:10 Utilizing Big Epigenomic Data for Cancer Gene Discovery [Abstract]
Kaifu Chen, Weill Cornell Medical College, USA [Personal Profile]
Kaifu Chen, PhD, is an Association Professor and is the Director for the Center For Bioinformatics and Computational Biology in the Methodist Hospital Research Institute and Cornell University Weil Cornell Medical College. His major research interest is to understand the epigenetic regulation of cancer development through bioinformatics interpretation of epigenome, genome, and transcriptome data.
Genes suspected of increasing the selective growth advantage of tumor cells were categorized as either Mut-driver genes or Epi-driver genes. Recent genome sequencing efforts successfully detected millions of cancer mutations. However, it remains a challenge to define the catalogue of cancer driver genes by mutation analysis alone. Only a small fraction of mutations in cancer actually affects driver genes. Meanwhile, many genes that do not mutate are epigenetically altered to drive cancer development. Unlike genetic sequence, epigenetic modifications vary with normal cell type, developmental stage, and biological environment. Criteria have not yet been formulated for distinguishing epigenetic changes that exert a selective growth advantage from those that do not. We approach this challenge by integrating over 10,000 genomes and epigenomes to investigate epigenetic mechanisms that regulate cancer driver genes, and through novel bioinformatics strategy developed for cancer gene discovery using epigenetic signatures associated with these mechanisms. Our research addresses the fundamental problem of how to identify cancer driver genes that are not mutated, but epigenetically altered in cancers to increase the selective growth advantage of tumor cells.
11:10 - 11:40 Big data medicine by network biomarkers and dynamic network biomarkers
Luonan Chen, Shanghai Institute of Biochemistry and Cell Biology, CAS
11:40 - 12:10 Title: TBD
Yongzhen Zhang, Chinese center for disease control and prevention
12:10 - 13:30 Lunch and BIG tour
13:30 - 15:10 Session 3, chaired by Cheng Li, Peking University
13:30 - 14:10 Variant interpretation - how to tackle the bottleneck of comprehensive cancer diagnostics
Daniel Stekhoven, ETH Zurich, Switzerland
14:10 - 14:40 Aberrant tRNA processing causes an autoinflammatory syndrome responsive to TNF inhibitors [Abstract]
Qing Zhou, Zhejiang University
We identified eight mutations in these nine patients, three of which have not been previously associated with SIFD. Three patients died in early childhood. Inflammatory cytokines, mainly interleukin (IL)-6, interferon gamma (IFN-γ) and IFN-induced cytokines were elevated in the serum, whereas tumour necrosis factor (TNF) and IL-1β were present in tissue biopsies of patients with active inflammatory disease. Deep tRNA sequencing of patients' fibroblasts showed significant deficiency of mature cytosolic tRNAs. EM of bone marrow and skin biopsy samples revealed striking abnormalities across all cell types and a mix of necrotic and normal-appearing cells. By immunoprecipitation, we found evidence for dysregulation in protein clearance pathways. In 4/4 patients, treatment with a TNF inhibitor suppressed inflammation, reduced the need for blood transfusions and improved growth. Mutations of TRNT1 lead to a severe and often fatal syndrome, linking protein homeostasis and autoinflammation. Molecular diagnosis in early life will be crucial for initiating anti-TNF therapy, which might prevent some of the severe disease consequences.
14:40 - 15:10 Title: TBD
Xia Li, Harbin Medical University
15:10 - 15:30 Tea & Coffee Break
15:30 - 17:20 Session 4, chaired by Qing Zhou, Zhejiang University
15:30 - 16:00 Asian Carp genomes provide insights into invasions and hybridization
Guoqing Lu, University of Nebraska at Omaha, USA
16:00 - 16:30 Mutational signatures and selection pressures estimation in the cancer genome
Zhixi Su, Fudan University
16:30 - 17:00 Three-Dimensional Genomics and Cancer [Abstract]
Cheng Li, Peking University
随着三维基因组技术的快速发展以及它的广泛应用前景,美国国立卫生研究院在2014年制定了4D Nucleome计划,从三维空间和时间尺度上研究细胞核内染色质的组织结构和功能。 我们研究组基于对癌症基因组中非整倍体变异频繁出现的原因和后果的研究兴趣,通过Hi-C实验和分析流程,研究多发性骨髓瘤细胞中非整倍体变异对三维基因组和表达谱的影响。 本报告将介绍三维基因组学背景以及我们组相关分析算法、数据库网站、癌症研究的进展。
17:00 - 17:20 Genomics, Proteomics & Bioinformatics (GPB) — a rising journal in the field
Yuxia Jiao, Genomics Proteomics Bioinformatics
18:00 - 20:00 Welcome Dinner
October 13: Talks
09:00 - 10:10 Session 5, chaired by Zhang Zhang, BIG, CAS
09:00 - 09:40 Effect sizes of somatic mutations in cancer [Abstract]
Jeffrey Townsend, Yale University, USA
A major goal of "big data" cancer biology is determination of the relative importance of the genetic alterations that confer selective advantage to cancer cells. Massive tumor sequence surveys have frequently ranked the importance of substitutions to cancer growth by P value or a false-discovery conversion thereof. However, P values are thresholds for belief, not metrics of effect. Their frequent misuse as metrics of effect has often been vociferously decried, even in cases when the only attributable mistake was omission of effect sizes. I will first discuss the scope of current methods that rank confidence in the overrepresentation of specific mutated genes in cancer genomes. Then, I will present an appropriate ranking—the cancer effect size, which is the selection intensity for somatic variants in cancer cell lineages. The selection intensity is a metric of the survival and reproductive advantage conferred by mutations in somatic tissue, and can be calculated using typical somatic tumor sequence data, complementary to P values and prevalences of somatic mutations. I'll bring to bear recent advances that draw upon an understanding of the development of cancer as an evolutionary process to estimate the effect sizes of somatic variants leading to cancer. I will illustrate how we have estimated the effect sizes of all recurrent single nucleotide variants in 22 cancer types, quantifying relative importance within and between driver genes. The selection intensity associated with each mutation has immediate relevance to ongoing decision-making in precision medicine tumor boards, to the selection and design of clinical trials, to the targeted development of pharmaceuticals, and to basic research prioritization. Also please let me know what formats of slides are permissible.
09:40 - 10:10 Big data in psychiatric disorders for genetic study
Suhua Chang, Institute of Mental Health, Peking University
10:10 - 10:30 Tea & Coffee Break
10:30 - 12:10 Session 6, chaired by Wening Zhao, BIG, CAS
10:30 - 11:10 Non-human primate research in Thailand: bridging biodiversity and biomedicine
Suchinda Malaivijitnond, Chulalongkorn University, Thailand
11:10 - 11:40 Title: TBD
Hui Lv, Shanghai Jiao Tong University
11:40 - 12:10 P4 Medicine Journey in China [Abstract]
Zhiyuan Hu, National Center for Nanoscience and Technology, CAS
Systems medicine has united genomics and genetics through family genomics to more readily identify disease genes. It has made blood a window into health and disease. It is leading to the stratification of diseases (division into discrete subtypes) for proper impedance match against drugs and the stratification of patients into subgroups that respond to environmental challenges in a similar manner (e.g. response to drugs, response to toxins, etc.). The convergence of patient-activated social networks, big data and their analytics, and systems medicine has led to a P4 medicine that is predictive, preventive, personalized, and participatory. Medicine will focus on each individual. It will become proactive in nature. It will increasingly focus on wellness rather than disease. A journey has started to use P4 medicine strategy to prevent chronic diseases in China.
12:10 - 13:30 Lunch
13:30 - 15:10 Session 7, chaired by Xiangfeng Wang, China Agricultural University
13:30 - 14:10 Title: TBD
Vsevolod J. Makeev, Vavilov Institute of General Genetics, RAS, Russia
14:10 - 14:40 The sea cucumber genome provides insights into morphological evolution and visceral regeneration [Abstract]
Xiaojun Zhang, Institute of Oceanology, CAS
Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, Apostichopus japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of ~805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome and proteome analyses of organ regrowth after inducted evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem duplicated PSP94-like gene family, a significantly expanded FREP gene family and a positively selected Wnt signaling pathway. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multi-omics data will be of prime value for commercial sea cucumber breeding programs.
14:40 - 15:10 A Gold Standard to Deconvolute Complicated Structures of Plant Genomes with Recursive Polyploidizations [Abstract]
Xiyin Wang, North China University of Science and Technology
Plants often have complex genomes, due to recursive polyploidizations and genome repatterning. This makes it difficult to deconvolute their genome structures, and barrier the understanding their formation and the exploration of gene functional evolution. It would be a great pity if failing to decipher a newly sequenced genome structure when enormous amount of money and time invested. However, such failures occurred quite often in last several years. Here, we propose a gold standard streamline to perform the genome structural analysis, adopted by quite several plant genome sequencing efforts, which we suggest be taken as a gold-standard to analyze a new genome sequence. Using the streamline, we found an overlooked tetraploidization in the common ancestor of cucurbiteceae, which might have contributed to the fast divergence and establishment of the important family of plants.
15:10 - 15:30 Tea & Coffee Break
15:30 - 17:30 Session 8, chaired by Zhixi Su, Fudan University
15:30 - 16:10 Title: TBD
Frank Eisenhaber, Bioinformatics Institute, Singapore
16:10 - 16:40 Machine learning for genomic breeding design in crops [Abstract]
Xiangfeng Wang, China Agricultural University
Creating the intelligent decision system to assist breeders in designing hybridization scheme is the future of crop breeding. It demands the integration of Big Data and molecular breeding technologies. As the brain of Artificial Intelligence (A.I.), machine learning methodology is a powerful tool for data mining and modeling in the Big Data era. We employed machine learning strategy to create genomic selection models for genomic prediction of the phenotypes of F1 hybrids and heterosis potentials based on their genotypes. The outcome from the study is implemented as a breeding decision-making system to assist breeders in precise selection of promising parental lines for hybridization breeding. The genotype and phenotype data in a population of 6,210 F1 hybrids was created by crossing 30 paternal lines and 207 maternal lines. The 30 paternal lines were elite inbred lines with broad genetic backgrounds, which are widely used in the current maize breeding industry in China. Thus, this dataset is only ideal for theoretical research of maize heterosis, but also can be used as a standard database as the training population for promoting genomic selection technology in China. With the advantage of machine learning, the genomic selection model fully considers the complex genetic structure of the studied population, to overcome the above-mentioned issue with robust stability. In addition, we have identified heterosis-determinant genomic regions, genes and markers in the maize genome which can be included as fixed effect when training genomic selection models, in order to further increase prediction power.
16:40 - 17:10 Title: TBD
Songnian Hu, BIG, CAS
17:10 - 17:30 Title: TBD
Hongkun Zheng, Biomarker Technologies
October 14: Talks and BHBD
09:00 - 10:30 BHBD Session, chaired by Yiming Bao, BIG, CAS
09:00 - 09:10 BHBD Opening ceremony, Yongbiao Xue, Director of BIG, CAS
09:10 - 09:25 Introduction to BHBD Alliance
09:25 - 10:15 Introduction to BHBD Founding Members
10:15 - 10:30 BIG Data Center, Yiming Bao, BHBD Coordinator
10:30 - 10:50 Tea & Coffee Break
10:50 - 12:00 Session 9, chaired by Zhang Zhang, BIG, CAS
10:50 - 11:30 Small genomes, big data: Improving bacterial genome annotation one COG at a time [Abstract]
Michael Y. Galperin [Personal Profile]
Dr. Michael Y. Galperin has received his PhD from the Lomonosov Moscow State University in Russia and postdoctoral training at the University of Louisville and University of Connecticut. He has been at the NCBI Computational Biology Branch since 1996, first as a GenBank Fellow, then Staff Scientist, and currently as a Lead Scientist. He has published more than 200 research papers, reviews and book chapters and is an author of a textbook on comparative genomics. In 2008-2017, he served as the editor of the Nucleic Acids Research annual Database Issue. He currently serves as an editor of the Genomics Updates section in Environmental Microbiology and, since July 1st 2018, an editor of the Journal of Bacteriology.
Microbial genome sequencing projects all over the world continue to flood public databases with sequences of deduced proteins, only a small fraction of which has been ever studied experimentally or could be studied in detail any time soon. The only feasible way to assign funcions to these proteins is to tentatively predict them through computational analysis. The Clusters of Orthologous Genes (COG) database, first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed unequivocal assignment of orthologs and paralogs; (ii) family-based approach, which used the functions of the characterized members of the protein family (COG) to assign function to the entire family and describe the range of the potential functions when there were more than one, and (iii) careful manual curation of the COG names and protein contents. The latest update of the COG database allowed to take an unbiased view at the progress in genome annotation and evaluate the problems and challenges in assigning functions to the remaining uncharacterized and poorly characterized open reading frames. Combining functional data from UniProt, RefSeq, Pfam, InterPro, and CDD databases, as well as the experimental data from original papers kept in PubMed and PubMed Central showed that original COG assignments had error rate of < 0.5%. Many tentative COG predictions have now been verified, either by direct experiments or through high-throughput methods. Functional assignments have been made for some widespread conserved proteins, many of which turned out to participate in translation, including rRNA maturation, tRNA modification, and similar processes. From the practical point of view, COGs can be useful for quality assurance of genomic sequences and for the identification of (i) “holes” in metabolic pathways and functional systems; (ii) unique enzymes that might be used as drug targets, and (iii) conserved genes and operons coding for previously overlooked functional systems.
11:30 - 12:00 Chemoresistance in Patients with Acute Leukemia: from Molecular Targets to Clonal Evolution [Abstract]
Qianfei Wang, BIG, CAS
Major therapeutic progress using cytotoxic agent has been made in leukemia over the last fifty years, yet chemoresistance remains an unmet clinical challenge. Approximate 10-25% of newly diagnosed AML and most relapsed patients have a poor drug response and low rate of long-term survival. Although advances in cancer genomics has greatly increased our understanding of the molecular characteristics in tumor biology, recent studies suggest that Darwinian evolution of intratumor heterogeneity represent a major challenge to develop therapeutic strategy for improved disease control. I will present our recent genomic and functional studies involving the JAK-STAT pathway and the methyl-transferase SETD2, as well as mutational landscape in refractory and relapsed AML under sequential treatment of induction regimens. I will discuss the limitations of molecular targeting and how evolutionary principles can be applied during the treatment of leukemia.
12:00 - 12:10 Closing remarks
12:10 - 14:00 Lunch
14:00 - 16:00 The 1st BHBD Alliance Meeting (invited only)