Documentation

1. Introduction

Genome-wide association study (GWAS) has been widely adopted to associate genomic variants with phenotypic differences in a wide range of species. With the rapid development of high-throughput sequencing technologies, the ever-growing availability of high-quality genotypes for a multitude of species has enabled to study the genetic architecture of many complex traits using GWAS. More than 6,000 GWAS studies over the last decade have revealed substantial genotype-phenotype associations for many traits not only in human but also in a wide diversity of plants and animals. Albeit valuable efforts have been made in integrating GWAS associations for human (GWAS Catalog, GWASdb, etc.) and A. thaliana (AraGWAS Catalog), a full collection of GWAS knowledge particularly for non-human species is scarce but highly needed for molecular breeding and improvement of agronomic traits.

The ever-growing availability of high-quality genotypes for a multitude of species has enabled many variant-trait associations have been published. Albeit valuable efforts have been made in integrating GWAS associations for human (GWAS Catalog, GWASdb, etc.) and A. thaliana (AraGWAS Catalog), a full collection of GWAS knowledge particularly for non-human species is scarce but highly needed for molecular breeding and improvement of agronomic traits. Here, we present the GWAS Atlas, a manually curated resource of published genome-wide variant-trait associations for plants and animals.

The current release of GWAS Atlas features a comprehensive collection 75467 curated genotype-to-phenotype (G2P) associations for 614 traits across 7 plants and 2 animals, which were manually curated from 254 publications. All integrated studies and associations can be accessed via a tabular web-interface and as downloadable tab-delimited files. More importantly, all associations and traits were annotated and organized based on a suite of reference ontologies (PTO, Plant Trait Ontology; ATOL, Animal Trait Ontology for Livestock), species-specific ontology (CO, Crop Ontology) and our customized ontologies (PPTO, Plant Phenotype and Trait Ontology; APTO, Animal Phenotype and Trait Ontology) which can be accessed via ontology-based browse. Besides, by analyzing all integrated associations for each species, we defined the top associated genes with multiple associations and the highly associated variants to multiple traits. Ongoing and future developments include incorporation of association data from more species and comparative analysis among different species.

2. How are G2P associations curated in GWAS Atlas?

2.1. Overview of GWAS Atlas Curation Processes

Curation Process

Figure 1 Overview of GWAS Atlas Curation Processes

2.2. Literature retrieval

We perform literature search in PubMed using species name and GWAS as keywords and accordingly obtain a total of 1850 publications. Among them, 1767 publications published after 2009 are retained. Publications are eligible for inclusion in GWAS Atlas if they contain both necessary description on biological traits and significant GWAS associations.

2.3. The GWAS data extraction

We manually curated the genotype-to-phenotype (G2P) associations and extracted the publication, study, and variant-trait association information from each published paper.

For publications, we included literature title, published year, journal (web of science format), species, total number of associations, PMID, and citations from Europe PMC.

For the studies, we included species, sampling spot, sampling year, condition, population, sample size, tissue, genotyping technology, association model, association number, and PMID. The curation model for genotyping technology and GWAS association model we used were shown in Table 1 and Table 2, respectively.
Table 1: The curation model for genotyping technology
Tech_id Genotyping Technology Abbreviation_name
1 Whole Genome Sequencing WGS
2 Genotyping by Sequencing GBS
3 Genotyping by Array Array
4 Specific-Locus Amplified Fragment Sequencing SLAF-seq
5 Whole Exome Sequencing WES
6 RNA Sequencing RNA-seq
7 Unclassified other
Table 2: The curation model for GWAS association model
Model_id Model_name Abbreviation_name
1 Mixed Linear Model MLM
2 General Linear Model GLM
3 Logistic Regression Model LRM
4 Compressed Mixed Linear Model CMLM
5 Unified Mixed Linear Model UMLM
6 Efficient Mixed Model EMMAX
7 Multi-Locus Mixed Model MLMM
8 Bayesian Sparse Linear Mixed Model BSLMM
9 Factored Spectrally Transformed Linear Mixed Model FaST-LMM
10 Fixed and random model Circulating Probability Unification FarmCPU
11 Joint-Linkage Model JLM
12 Unclassified other
For the variant-trait associations, we integrated species, genome version, variant ID, genomic position, traits, GWAS association P-value, R2, and mapped genes.

2.4. Variation unifying and annotation

As the genome sequence is continuously updating, we unified the genomic position of variants which were collected from different publications to the latest version of the reference genome in GVM using sequence-based searching. If there are variant records in the GVM database, we use the reference identifier in VarID and redirect the user to the variation view in GVM. All variants were annotated by VEP.

2.5. Trait term annotation

We download the Planteome reference ontology (Plant Trait Ontology, PTO) developed by the Planteome project, and species-specific ontology (Crop Ontology, CO) developed by various plant breeding and research communities from http://browser.planteome.org/amigo. PTO is a controlled vocabulary to describe phenotypic traits in plants, and each trait is a distinguishable feature, characteristic, quality or phenotypic feature of a developing or mature plant, or a plant part. CO is a collection of species- or clade-specific application ontology, maize (Zea mays), rice (Oryza sativa), Soybean (Glycine max) and Sorghum (Sorghum bicolor) CO were used in GWAS Atlas. These reference ontologies serve as common standards for semantic comparison. For animals, the livestock ontology (Animal Trait Ontology for Livestock, Atol) was downloaded from http://www.atol-ontology.com/en/erter-2/.

We performed interactively search for all curated plant phenotype terms using the 'term search' in Planteome (API. Given a partial term name or synonym. For example, a search for "pollen" will return multiple terms and/or synonyms with "pollen" in the name (Make an HTTP GET request to a URI with the following form: http://browser.planteome.org/api/search/ontology?q=pollen). Those trait terms that were perfectly matched to the 'Ontology Term' of reference ontologies, 'Synonym', or 'Definition' were annotated.

For those unannotated phenotype terms, we customized Plant Phenotype and Trait Ontology (PPTO) and Animal Phenotype and Trait Ontology (APTO) based on PTO rules, and constructed a controlled vocabulary to describe these phenotypic traits. To leverage effectively and efficiently, we assign a unique identifier 'PPTO:XXXXXX' or 'APTO:XXXXXX' for plant or animal traits. The PPTO encompasses nine broad, upper-level categories of plant traits as PTO (see Figure 2). In the trait information, we recorded the ID, Trait name, Synonymous, Definition, Comment, SubClassOf information (Figure 3).

Curation Process

Figure 2 The major nine categories of plant traits
Curation Process

Figure 3 A snap of trait information
The PPTO, PTO and CO are displayed using a tree plugin (JsTree), user could browse or search easily by ontology term name.

3. How do I search the associations in GWAS Atlas?

In the 'Search' module, we support user to query the GWAS Atlas data by term keywords (e.g. height), gene ID (; Zm00001d021954), and genomic position (chr1:14702150-37601000). The number of related traits, genes, and variants, and all eligible search results will be listed.

For example, by specifying a keyword 'seed' (Figure 4), there are 24 traits and 30 genes including 'seed' or related to 'seed'. The general information for each retrieved item was shown in the right column (Figure 5). The filter options were provided in the left bottom to narrow down the query results. For a specific trait, the detailed trait information, associations, studies and publications were provided (Figure 6).

search

Figure 4 Screenshots for Search
Search result

Figure 5 Search result
Detail result

Figure 6 Detail information

4. How do I browse the associations?

User could browse the association data on 'Browse' view. All associations were mapped and annotated to three plant ontologies (PTO, CO, PPTO) and two animal ontologies (ATOL, APTO)(Figure 7), which are grouped by domain to ease finding relevant ontologies. The ontologies system allows user to browse, search and visualize the content of associations for each species. By selecting specific ontology and trait term, the corresponding trait information, associations, studies and publications were displayed in the right column (Figure 8). Uses could also enter a keyword, those relevant terms will match automatically for user to select.
tree

Figure 7 Reference ontology
Search result

Figure 8 Trait, association, studies, and publication information for each term

5. How can I get GWAS Atlas data?

If you are interested in the associations for a species of interest, please visit the 'Download' page, which includes associations for each species in .xlsx and txt format. Our customized ontologies including PPTO and APTO were also available in obo format.

6. Support

6.1. Funding Support

  • Strategic Priority Research Program of Chinese Academy of Sciences (XDA08020102)
  • The 13th Five-year Informatization Plan of Chinese Academy of Sciences (XXH13505-05)
  • The Youth Innovation Promotion Association of Chinese Academy of Sciences (2018134)

6.2. Comments & Collaborations

We look forward to worldwide comments, suggestions and guidance from colleagues and peers with common research interests. We also invite the scientific community to submit their analysis results of GWAS to GWAS Atlas and to build collaborations in improving the functionalities of GWAS Atlas.

6.3. FeedBack

We would love to hear from you for any questions or comments. Please find our contact information here. Telephone: +86 (10) 8409-7620
Fax: +86 (10) 8409-7298
Email: gwas@big.ac.cn
Postal Address:
The GWAS Team, National Genomics Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS)
NO 1 Beichen West Road, Chaoyang District, Beijing 100101, China

6.4. Documentation in Chinese

The Documentation in Chinese can be downloaded here (pdf format).