Documents for GWH

GWH Handbook

The GWH Handbook (Version beta, June 2017) containing detailed data items' descriptions is freely available here.



GWH Submission Quick Start Guide

The GWH Submission Quick Start Guide (Version beta, June 2017) containing submission descriptions is freely available here.



Tutorial

GWH Data Model

Designed for compatibility, Genome Warehouse (GWH) follows INSDC data standardsand structures. All data are organized into three objects, i.e., BioProject, BioSample, Genome (Figure 1). "BioProject", bearing an accession number prefixed with "PRJC", providesan overall description for an individual research initiative, including basic description, organism, data type, submitter, funding information, and publication(s) if available.

Figure 1: Data model in GWH



GWH Data Relationships

Data relationships in GWH are as follows.

BioProject: is an overall description of a single research initiative, typically involving multiple samples.

BioSample: describes biological source material; each physically unique specimen should be registered as a single BioSample with a unique set of attributes.

Genome: describes detailed genome assembly for a BioSample. One BioSample has one or more genome assemblies. For example, one plant sample may have mitochondrion genome and full genome. One genome contains genome sequence file, and shold contain genome annotation file, AGP file(s), and genome assignment file(s).

Submission pipeline
Map new GWH submission register BioProject BioSample


Frequently Asked Questions

Answers to some of the most frequently asked questions submitted to the GWH are listed as follows.
  1. Introduction
    1. What is GWH?
    2. How can I submit data to GWH?
  2. GWH Accounts
    1. How do I acquire a GWH account?
    2. I’ve forgotten my GWH username and password?
  3. Data Entry and Transmit
    1. How do I get started?
    2. How can I submit genome files to the GWH submission system?
    3. How to prepare submission files?
    4. What is the process for submitted files?
    5. What is an MD5 checksum and how do I compute it?
  4. Data release and cite
    1. How do I set the release date or make data publicly available?
    2. How to cite genome accession NO. in my publication?
  5. Help
    1. Contact information
    2. Collaboration & Visit

  1. Introduction
    1. What is GWH?

      GWH shortens for Genome Warehouse, a data repository for genome assembly data. It archives genome assembly sequence, genome annotation and other associated data. GWH is one of database resources in BIG Data Center (BIGD), part of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), serving as a primary archive of genome assembly associated data for worldwide institutions and laboratories.

    2. How can I submit data to GWH?

      Only registered users can submit data using Genome Sequence submission (Gsub) System. Briefly, data submission requires the following steps.

         a)    Create a BIGD account and/or login to GWH;

         b)    Enter metadata information and specify release date;

         c)    Submit data files;

  2. GWH Accounts
    1. How do I acquire a GWH account?

      Any user can freely register and create a Gsub account. After your registration data is submitted, a confirmation email will be automatically sent to you for activating your account.

    2. I’ve forgotten my GWH username and password?

      ♦  If you just have forgotten your password, you may find the password by clicking “Forgot password”. You will receive an e-mail and please follow the URL to reset your password within 30 minutes.

      ♦  If you are already a member and you’ve forgotten both your GWH username and password, please feel free to contact us. We will do our best to help you.

  3. Data Entry and Transmit
    1. How do I get started?

      Data submission requires that you log into Genome Sequence Submission (Gsub) System, so you need to create an account if you are not a member.

      Please note that fields marked are required when submitting metadata.

    2. How can I submit genome files to the GWH submission system

      In the current version 1.0beta of GWH, it supports to submit files by the way of online directly and ftp. It is highly recommended that you submit your files using a dedicated FTP tool (e.g., FileZilla). Please transmit you data files to the GWH FTP site using the following credentials

            Address:     ftp://submit.big.ac.cn

            User:          Same as you login the Gsub

            Password: Same as you login the Gsub

            Path:          /GWH/WGSXXXXXX (your submission ID).

    3. How to prepare submission files?

      In the current version, we accept genome associated data file format as follows:

      ♦  Genome sequence : FASTA (Step3 Files)

      ♦  Genome annotation: GFF or TBL (Step3 Files)

      ♦  Sequence ordering and orientation information: AGP (Step3 Files)

          Note: required if genome assembly is completed genome or draft genome in chromosome level.

      ♦  Sequence assignment information: CSV (Step4 Assignment)

           Note: required if genome assembly is draft genome in scaffold/chromosome level.
      The detail information about files format please see "Genome Data standards".
    4. What is the process for submitted files?

      All submitted files that you submit via FTP will be regularly moved from FTP to a staging area for processing. Thus, it is quite normal that files “disappear” from FTP. If files succeed in passing the validation process, they will be made public or controlled access according to their release date set by users and the status will change to 'Released' or 'Sucessful' respectively.

    5. What is an MD5 checksum and how do I compute it?

      MD5 checksums are used to verify the integrity of transmitted data. An MD5 checksum is a 32-character alphanumeric string like "e3b5dd475c449300dd11f258538ff494".

      ♦  For Linux users, use: $ md5sum filename

      ♦  For Mac users, use: $ md5 filename

      ♦  For Windows users, use: $ certutil -hashfile filename MD5; and combine the code by removing the spaces. Or use third party tool.

    6. What is an MD5 checksum and how do I compute it?

      MD5 checksums are used to verify the integrity of transmitted data. An MD5 checksum is a 32-character alphanumeric string like "e3b5dd475c449300dd11f258538ff494".

      ♦  For Linux users, use: $ md5sum

      ♦  For Mac users, use: $ md5

      ♦  Windows users need to use a third-party tool.

  4. Data release and cite
    1. How do I set the release date or make data publicly available?

      When you submit data, you will find a button named “Release date” at the bottom of "Step 2 Gerneral info" web page. After you specify the release date, it will trigger the data release according to the inputted date. Note that release of Bioproject and Biosample is also triggered by the released of WGS-associated data. It is suggested that you set the release date of Genome later than BioProject or BioSample.

    2. How to cite genome accession NO. in my publication?

      GWH accession No. is prefixed with ‘GWH’ and is followed by 4 Capital letters, and 8 digits. For example, GWHXXXX00000000. Please cite the genome accession number GWHXXXX00000000 in your publication like this:

      The whole genome sequence data reported in this paper have been deposited in the Genome Warehose in BIG Data Center [1], Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number GWHXXXX00000000 that is publicly accessible at http://bigd.big.ac.cn/gwh.

      The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res 2017, 45(D1): D18-D24. [PMID=27899658]

  5. Help
    1. Contact information

      If you have any question or would like to give us any suggestion/comment or report a bug, please feel free to contact us via email (GWH@big.ac.cn) or Instant Messaging Software (QQ Group: 541196594).

    2. Collaboration & Visit

      We are also happy if you would like to have a visit to explore the possibility for collaboration or learn more about GWH.

      Address:

            BIG Data Center

            Beijing Institute of Genomics, Chinese Academy of Sciences

            No.1 Beichen West Road, Chaoyang District

            Beijing 100101, China

            Tel: +86 (10) 8409-7340

            Fax: +86 (10) 8409-7720