National Genomics Data Center

Yiming Bao


Email: baoym (AT)

Tel: 10-84097858


  • Deputy Director of China National Center for Bioinformation, China, 2020 - Present

  • Director of National Genomics Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), China, 2019 - Present

  • Director of BIG Data Center, BIG, CAS, China, 2017 - 2019

  • Professor, BIG, CAS, China, 2017 - Present

  • Staff Scientist, National Center for Biotechnology Information (NCBI)/NLM/NIH, USA, 2005 - 2017

  • Viral Genome Scientist, Computercraft Corporation (as a government contractor working at NCBI), USA, 2001-2005

  • Postdoctoral Associate and Senior Research Associate, Noble Foundation, USA, 1994 - 2001

  • Teaching and Research Assistant, Peking University, China, 1987-1991


  • PhD in Genetics, John Innes Centre (through University of East Anglia), UK, 1994

  • BS in Biochemistry, Peking University, China, 1987


  • Bioinformatics

  • Viral Genomics


  • National Key Research and Development Project, Global Omics Data Sharing Initiative, 2017-2020, leader

  • IUBS, Open Biodiversity and Health Big Data Initiative, 2017-2020, leader

  • The 13th Five-year Informatization Plan of Chinese Academy of Sciences, Big Data-Driven Innovation and Demonstration Platform in the Field of Bioinformatics, 2018-2019, leader


  • Journal ReviewerArchive of Virology; Bioinformatics; BMC Bioinformatics; BMC Microbiology; Computers in Biology; Current Genomics; Database; Infection, Genetics and Evolution; Journal of Computational Biology; Journal of Genetics and Genomics; Journal of Virology; Molecular Phylogenetics and Evolution; Nucleic Acids Research; Plant Molecular Biology; PLoS ONE; PNAS; Vaccine

  • Member: Virus Data Subcommittee, International Committee on Taxonomy of Viruses (ICTV), 2011-2017


Chen M., Ma Y., Li R., Bao Y. (2020). Current Status and Prospects of Genomics Data Analysis Methods. Frontiers of Data and Computing 2, 1-19.

Shah, S., Malik, A. H., Zhang, B., Bao, Y., & Qazi, J. (2020). Metagenomic analysis of relative abundance and diversity of bacterial microbiota in Bemisia tabaci infesting cotton crop in Pakistan. Infection, Genetics and Evolution 84, 104381.

Zhao W.M., Song S.H., Chen M.L., Zou D., Ma L.N., Ma Y.K., Li R.J., Hao L.L., Li C.P., Tian D.M., Tang B.X., Wang Y.Q., Zhu J.W., Chen H.X., Zhang Z., Xue Y.B., Bao Y.M. (2020). The 2019 novel coronavirus resource. Yi Chuan 42, 212-221.

Xiong Z., Li M., Yang F., Ma Y., Sang J., Li R., Li Z., Zhang Z, Bao Y. (2020). EWAS Data Hub: a resource of DNA methylation array data and metadata. Nucleic Acids Research 48, D890-D895.

Bao Y. as co-corresponding author in National Genomics Data Center Members and Partners. (2020). Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Research 48, D24-D33.

Pervaiz N, Shakeel N, Qasim A, Zehra R, Anwar S, Rana N, Xue Y, Zhang Z, Bao Y., Abbasi AA. (2019). Evolutionary history of the human multigene families reveals widespread gene duplications throughout the history of animals. BMC Evol Biol. 19, 128. doi: 10.1186/s12862-019-1441-0. PMID: 31221090.

Ma L.N., Cao J., Liu L., Li Z., Shireen H., Pervaiz N., Batool F., Raza R., Zou D., Bao Y., Abbasi A.A., Zhang Z. (2019). Community curation and expert curation of human long non-coding RNAs. Current Protocols in Bioinformatics 67, e82.

Amarasinghe G.K., Ayllon M.A., Bao Y., Basler C.F., et al. (2019). Taxonomy of the order Mononegavirales: update 2019. Archives of Virology 164, 1967-1980.

Wang G., Yin H., Li B., Yu C., Wang F., Xu X., Cao J., Bao Y., Wang L., Abbasi A.A., Bajic V.B., Ma L., Zhang Z. (2019). Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics, 35, 2949-2956.

Seemab S., Pervaiz N., Zehra R., Anwar S., Bao Y., Abbasi A.A. (2019). Molecular evolutionary and structural analysis of familial exudative vitreoretinopathy associated FZD4 gene. BMC Evolutionary Biology 19, 72.

Bao Y. as co-corresponding author among BIG Data Center Members. (2019). Database Resources of the BIG Data Center in 2019. Nucleic Acids Research 47, D8-D14.

Li M., Zou D., Li Z., Gao R., Sang J., Zhang Y., Li R., Xia L., Zhang T., Niu G., Bao Y., Zhang Z. (2019). EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Research 47, D983-D988.

Tang B., Zhou Q., Dong L., Li W., Zhang X., Lan L., Zhai S., Xiao J., Zhang Z., Bao Y., Zhang Y-P., Wang G-D., Zhao W. (2019). iDog: an integrated resource for domestic dogs and wild canids. Nucleic Acids Research 47, D793-D800.

Zhao Y., Wang J., Liang F., Liu Y., Wang Q., Zhang H., Jiang M., Zhang Z.W., Zhao W., Bao Y., Zhang Z., Wu J., Asmann Y.W., Li R., Xiao J. (2019). NucMap: a database of genome-wide nucleosome positioning map across species. Nucleic Acids Research 47, D163-D169

Ma Y. & Bao Y. (2018). Prospects for national biological big data centers. Hereditas (Beijing) 40, 938-943.

Pavesi A., Vianelli A., Chirico N., Bao Y., et al. (2018) Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. PLoS ONE 13: e0202513.

Bao Y. & Xue Y. (2018). Current Status and Prospect of Life and Health Big Data. Bulletin of the Chinese Academy of Sciences 33, 861-865.

Bao Y. & Kuhn J.H. (2018). Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis. Methods Mol Biol1604, 43-53.

Maes P., Alkhovsky S.V., Bao Y., et al. (2018). Taxonomy of the family Arenaviridae and the order Bunyavirales: update 2018. Archives of Virology 163, 2295-2310.

Li R.J., Liang F., ..., Bao Y., et al. (2018). MethBank 3.0: a database of DNA methylomes across a variety of species. Nucleic Acids Research 46, D288-D295.

Song S.H., Tian D.M., ..., Bao Y., et al. (2018). Genome  Variation  Map:  a  data  repository  of  genome  variations  in  BIG  Data  Center. Nucleic Acids Research 46, D944-D949.

Bao Y. as co-corresponding author among BIG Data Center Members. (2018). Database Resources of the BIG Data Center in 2018. Nucleic Acids Research 46, D14-D20.

Sang J., Wang Z., …, Bao Y., et al. (2018). ICG: a wiki-driven knowledgebase of internal control genes for RT-qPCR normalization. Nucleic Acids Research 46, D121-D126.

Hatcher E.*, Bao Y.*, et al. (2017). NCBI will no longer make taxonomy identifiers for individual influenza strains on January 15, 2018. PeerJ Preprints 5:e3428v1

Bao Y., Amarasinghe G.K., Basler C.F., et al. (2017). Implementation of Objective PASC-Derived Taxon Demarcation Criteria for Official Classification of Filoviruses. Viruses 9. pii: E106. doi: 10.3390/v9050106.

Wang F., Fang Q., Wang B., Yan Z., Hong J., Bao Y., Kuhn J.H., Werren J.H., Song Q., Ye G. (2017). A novel negative-stranded RNA virus mediates sex ratio in its parasitoid host. PLoS Pathology 13, e1006201.

Amarasinghe G.K., Bao Y., Basler C.F., Bavari S., et al. (2017). Taxonomy of the order Mononegavirales: update 2017. Archives of Virology 162, 2493-2504.

Hatcher E.L., Zhdanov S.A., Bao Y., Blinkova O., Nawrocki E.P., Ostapchuck Y., Schäffer A.A., Brister J.R. (2017). Virus Variation Resource - improved response to emergent viral outbreaks. Nucleic Acids Research 45, D482-D490.

Kuhn J.H., Wiley M.R., Rodriguez S.E., Bao Y., et al. (2016). Genomic Characterization of the Genus Nairovirus (Family Bunyaviridae). Viruses 8, 164.

Afonso C.L., Amarasinghe G.K., Banyai K., Bao Y., et al. (2016). Taxonomy of the order Mononegavirales: update 2016. Archives of Virology 161, 2351-2360.

Kuhn J.H., Lauck M., Bailey A.L., Shchetinin A.M., Vishnevskaya T.V., Bao Y., et al. (2016). Reorganization and expansion of the nidoviral family Arteriviridae. Archives of Virology 161, 755-768.

O'Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D., Astashyn A., Badretdin A., Bao Y., et al. (2016). Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research 44, D733-D745.

Lauck M., Alkhovsky S.V., Bao Y., et al. (2015). Historical Outbreaks of Simian Hemorrhagic Fever in Captive Macaques Were Caused by Distinct Arteriviruses. Journal of Virology 89, 8082-8087.

Radoshitzky S.R., Bao Y., et al. (2015). Past, present, and future of arenavirus taxonomy. Archives of Virology 160, 1851-1874.

Brister J.R., Ako-Adjei D., Bao Y. & Blinkova O. (2015). NCBI Viral Genomes Resource. Nucleic Acids Research 43, D571-D577.

Kuhn J.H., Dürrwald R., Bao Y., et al. (2015). Taxonomic reorganization of the family Bornaviridae. Archives of Virology 160, 621-632.

Kuhn J.H., Andersen K.G., Baize S., Bao Y., et al. (2014). Nomenclature- and Database-Compatible Names for the Two Ebola Virus Variants that Emerged in Guinea and the Democratic Republic of the Congo in 2014. Viruses 6, 4760-4799.

Kuhn J.H., Andersen K.G., Bao Y., et al. (2014). Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names. Viruses 6, 3663-3682.

Bao Y., Chetvernin V. & Tatusova T. (2014). Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Archives of Virology 159, 3293-3304.

Du Z., Chen A., Chen W., Liao Q., Zhang H., Bao Y., Roossinck M.J. & Carr J.P. (2014). Nuclear-cytoplasmic partitioning of the Cucumber mosaic virus 2b protein determines the balance between its roles as a virulence determinant and RNA silencing suppressor. Journal of Virology 88, 5228-5241.

Brister J.R., Bao Y., Zhdanov S.A., Ostapchuck Y., Chetvernin V., Kiryutin B., Zaslavsky L., Kimelman M. & Tatusova T.A. (2014). Virus Variation Resource--recent updates and future directions. Nucleic Acids Research 42, D660-D665.

Kuhn J.H., Bao Y., et al. (2014). Virus nomenclature below the species level: a standardized nomenclature for filovirus strains and variants rescued from cDNA. Archives of Virology 159, 1229-1237.

Brister J.R. & Bao Y. (2013). Virus Variation. 2013 Nov 14. In: The NCBI Handbook [Internet]. 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013-. Available from:

Bao Y., Brister J.R., Blinkova O., et al. (2013). About Viral and Phage Genome Processing and Tools. 2013 Mar 30 [Updated 2013 May 10]. In: The NCBI Handbook [Internet]. 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013-. Available from:

Kuhn J.H., Bao Y., et al. (2013). Virus nomenclature below the species level: a standardized nomenclature for laboratory animal-adapted strains and variants of viruses assigned to the family Filoviridae. Archives of Virology 158, 1425-1432.

Kuhn J.H., Bao Y., et al. (2013). Virus nomenclature below the species level: a standardized nomenclature for natural variants of viruses assigned to the family Filoviridae. Archives of Virology 158, 301-311.

Bao Y., Chetvernin V. & Tatusova T. (2012). PAirwise Sequence Comparison (PASC) and Its Application in the Classification of Filoviruses. Viruses 4, 1318-1327.

Gilbert, J.A., Bao, Y., et al. (2012). Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4–7, 2012. Standards in Genomic Sciences 6, 276-286.

Brister, J.R., Bao, Y., Kuiken, C., Lefkowitz, E.J., Mercier, P.L., Leplae, R., Madupu, R., Scheuermann, R.H., Schobel, S., Seto, D., Shrivastava, S., Sterk, P., Zeng, Q., Klimke, W. & Tatusova, T. (2010). Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop. Viruses 2, 2258-2268.

Resch, W., Zaslavsky, L., Kiryutin B., Rozanov, M., Bao Y. & Tatusova T.A. (2009). Virus variation resource at the national center for biotechnology information: dengue virus. BMC Microbiology 9, 65.

Bao, Y., Kapustin Y. & Tatusova T. (2008). Virus Classification by Pairwise Sequence Comparison (PASC).  Encyclopedia of Virology, 5 vols. (B.W.J. Mahy and M.H.V. Van Regenmortel, Editors). Oxford: Elsevier. Vol. 5, 342-348.

Zaslavsky, L., Bao Y. & Tatusova T.A. (2008). Visualization of large influenza virus sequence datasets using adaptively aggregated trees with sampling-based subscale representation. BMC Bioinformatics  9, 237.

Bao, Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J. & Lipman D. (2008). The influenza virus resource at the national center for biotechnology information. Journal of Virology 82, 596-601.

Bao, Y., Bolotov P., Dernovoy D., Kiryutin B. & Tatusova T. (2007). FLAN: a web server for influenza virus genome annotation. Nucleic Acids Research 35, W280-W284.

Zaslavsky, L., Bao Y. & Tatusova T.A. (2007). An adaptive resolution tree visualization of large influenza virus sequence datasets, p. 192-202. In I. Mandoiu and A. Zelikovsky (ed.), Bioinformatics Research and Applications. Proc. of ISBRA 2007, Lecture Notes in Bioinformatics 4463. Springer, Verlag.

Ghedin E, Sengamalay NA, Shumway M, Zaborsky J, Feldblyum T, Subbu V, Spiro DJ, Sitz J, Koo H, Bolotov P, Dernovoy D, Tatusova T, Bao Y, St George K, Taylor J, Lipman DJ, Fraser CM, Taubenberger JK & Salzberg SL. (2005). Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature 437, 1162-1166.

Holmes E. C., Ghedin E., Miller N., Taylor J., Bao Y., St. George K., Grenfell B.T., Salzberg S.L., Fraser C.M., Lipman D.J. & Taubenberger J.K. (2005). Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses. PLoS Biology 3: e300.

Bao, Y., Federhen, S., Leipe, D., Pham, V., Resenchuk, S., Rozanov, M., Tatusov, R. & Tatusova, T.  (2004).  National Center for Biotechnology Information Viral Genomes ProjectJournal of Virology 78, 7291-7298.

Ding, X. S., Liu, J., Cheng, N-H., Folimonov, A., Hou, Y-M., Bao, Y., Katagi, C., Carter, S. A. & Nelson, R. S. (2004).  The Tobacco mosaic virus 126-kDa Protein Associated with Virus Replication and Movement Suppresses RNA SilencingMolecular Plant-Microbe Interactions 17, 583-592.

Li, Y., Bao, Y. M., Wei, C. H., Kang, Z. S., Zhong, Y. W., Mao, P., Wu, G., Chen, Z. L. Schiemann, J. & Nelson, R. S. (2004). Rice dwarf phytoreovirus segment s6-encoded nonstructural protein has a cell-to-cell movement function.  Journal of Virology 78, 5382-5389.

Chen Y. J., Gao G., Bao Y. M., Lopez R., Wu J. M., Cai T., Ye Z. Q., Gu  X. C. & Luo J. C. (2003).  Initial Analysis of Complete Genome Sequences of SARS Coronavirus.  Acta Genetica Sinica 30, 493-500.

Sivakumaran, K., Bao, Y., Roossinck, M. J. & Kao, C. C. (2000).  Recognition of the core RNA promoter for the minus-strand RNA synthesis by the replicase of Brome mosaic virus and Cucumber mosaic virus.  Journal of Virology 74, 10323-10331.

Itaya, A., Woo, Y-M, Masuta, C., Bao, Y., Nelson, R. S. & Ding, B. (1998).  Developmental regulation of intercellular protein trafficking through plasmodesmata in tobacco leaf epidermis.  Plant Physiology 118, 373-385.

Itaya, A., Hickman, H., Bao, Y., Nelson, R. S. & Ding, B. (1997).  Cell-to-cell trafficking of cucumber mosaic virus movement protein: green fluorescent protein fusion produced by biolistic gene bombardment in tobacco.  The Plant Journal 12, 1223-1230.

Shintaku, M. H., Carter, S. A., Bao, Y. & Nelson, R. S. (1996).  Mapping nucleotides in the 126-kDa protein gene that control the differential symptoms induced by two strains of tobacco mosaic virus.  Virology 221, 218-225.

Futterer, J., Potrykus, I., Bao, Y., Li, L., Burns, T. M., Hull, R. & Hohn, T. (1996).  Position-dependent ATT initiation during plant pararetrovirus rice tungro bacilliform virus translation.  Journal of Virology 70, 2999-3010.

Bao, Y., Carter, S. A. & Nelson, R. S. (1996).  The 126- and 183-kilodalton proteins of tobacco mosaic virus and not their common nucleotide sequence, control mosaic symptom formation in tobacco. Journal of Virology 70, 6378-6383.

Bao, Y. & Hull, R. (1994).  Replication intermediates of rice tungro bacilliform virus support a replication mechanism involving reverse transcription.  Virology 204, 626-633.

Chu, R-Y, Leng, X-H, Bao, Y-M, Pu, Z-Q, Pan, N-S, & Chen, Z-L. (1993).  Amplification of soybean mosaic virus coat protein gene by polymerase chain reaction and its sequence analysis.  Acta Botanica Sinica 34, 523-528.

Bao, Y-M, Chu, R-Y, Han, J-H, Zhang, H., Pan, N-S, Gu, X-C, & Chen, Z-L. (1993).  Cloning and sequencing of trichosanthin gene and its expression in Escherichia coli and tobacco plant.  Science in China (Series B) 36, 669-676.

Bao, Y. & Hull, R. (1993).  Mapping the 5'-terminus of rice tungro bacilliform virus genomic RNA.  Virology 197, 445-448.

Bao, Y. & Hull, R. (1993).  A strong-stop DNA in rice plants infected with rice tungro bacilliform virus.  Journal of General Virology 74, 1611-1616.

Bao, Y. & Hull, R. (1992).  Characterization of the discontinuities in rice tungro bacilliform virus DNA.  Journal of General Virology 73, 1297-1301.