The GWH Handbook (Version beta, June 2017) containing detailed data items' descriptions is freely available here.
The GWH Submission Quick Start Guide (Version beta, June 2017) containing submission descriptions is freely available here.
Designed for compatibility, Genome Warehouse (GWH) follows INSDC data standardsand structures. All data are organized into three objects, i.e., BioProject, BioSample, Genome (Figure 1). "BioProject", bearing an accession number prefixed with "PRJC", providesan overall description for an individual research initiative, including basic description, organism, data type, submitter, funding information, and publication(s) if available.
Figure 1: Data model in GWH
Data relationships in GWH are as follows.
BioProject: is an overall description of a single research initiative, typically involving multiple samples.
BioSample: describes biological source material; each physically unique specimen should be registered as a single BioSample with a unique set of attributes.
Genome: describes detailed genome assembly for a BioSample. One BioSample has one or more genome assemblies. For example, one plant sample may have mitochondrion genome and full genome. One genome contains genome sequence file, and shold contain genome annotation file, AGP file(s), and genome assignment file(s).
GWH shortens for Genome Warehouse, a data repository for genome assembly data. It archives genome assembly sequence, genome annotation and other associated data. GWH is one of database resources in BIG Data Center (BIGD), part of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), serving as a primary archive of genome assembly associated data for worldwide institutions and laboratories.
Only registered users can submit data using Genome Sequence submission (Gsub) System. Briefly, data submission requires the following steps.
a) Create a BIGD account and/or login to GWH;
b) Enter metadata information and specify release date;
c) Submit data files;
Any user can freely register and create a Gsub account. After your registration data is submitted, a confirmation email will be automatically sent to you for activating your account.
♦ If you just have forgotten your password, you may find the password by clicking “Forgot password”. You will receive an e-mail and please follow the URL to reset your password within 30 minutes.
♦ If you are already a member and you’ve forgotten both your GWH username and password, please feel free to contact us. We will do our best to help you.
Data submission requires that you log into Genome Sequence Submission (Gsub) System, so you need to create an account if you are not a member.
Please note that fields marked are required when submitting metadata.
In the current version 1.0beta of GWH, it supports to submit files by the way of online directly and ftp. It is highly recommended that you submit your files using a dedicated FTP tool (e.g., FileZilla). Please transmit you data files to the GWH FTP site using the following credentials
User: Same as you login the Gsub
Password: Same as you login the Gsub
Path: /GWH/WGSXXXXXX (your submission ID).
In the current version, we accept genome associated data file format as follows:
♦ Genome sequence : FASTA (Step3 Files)
♦ Genome annotation: GFF or TBL (Step3 Files)
♦ Sequence ordering and orientation information: AGP (Step3 Files)Note: required if genome assembly is completed genome or draft genome in chromosome level.
♦ Sequence assignment information: CSV (Step4 Assignment)Note: required if genome assembly is draft genome in scaffold/chromosome level.
All submitted files that you submit via FTP will be regularly moved from FTP to a staging area for processing. Thus, it is quite normal that files “disappear” from FTP. If files succeed in passing the validation process, they will be made public or controlled access according to their release date set by users and the status will change to 'Released' or 'Sucessful' respectively.
MD5 checksums are used to verify the integrity of transmitted data. An MD5 checksum is a 32-character alphanumeric string like "e3b5dd475c449300dd11f258538ff494".
♦ For Linux users, use: $ md5sum filename
♦ For Mac users, use: $ md5 filename
♦ For Windows users, use: $ certutil -hashfile filename MD5; and combine the code by removing the spaces. Or use third party tool.
When you submit data, you will find a button named “Release date” at the bottom of "Step 2 Gerneral info" web page. After you specify the release date, it will trigger the data release according to the inputted date. Note that release of Bioproject and Biosample is also triggered by the released of WGS-associated data. It is suggested that you set the release date of Genome later than BioProject or BioSample. If a paper citing the sequence or accession number is published prior to the specified date, the sequence will be released upon publication. Otherwise, GWH will release sequence data on the specified date. The release date can be changed through the genome portal.
GWH accession No. is prefixed with ‘GWH’ and is followed by 4 Capital letters, and 8 digits. For example, GWHXXXX00000000. Please cite the genome accession number GWHXXXX00000000 in your publication like this:
The whole genome sequence data reported in this paper have been deposited in the Genome Warehose in BIG Data Center , Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number GWHXXXX00000000 that is publicly accessible at http://bigd.big.ac.cn/gwh.
Database Resources of the BIG Data Center in 2018. Nucleic Acids Res 2018, 46(D1):D14-D20. [PMID=29036542]
We are also happy if you would like to have a visit to explore the possibility for collaboration or learn more about GWH.
BIG Data Center
Beijing Institute of Genomics, Chinese Academy of Sciences
No.1 Beichen West Road, Chaoyang District
Beijing 100101, China
Tel: +86 (10) 8409-7858
+86 (10) 8409-7756
Fax: +86 (10) 8409-7720