Genome-wide association study (GWAS) has been widely adopted to associate genomic variants with phenotypic differences in a wide range of species.
With the rapid development of high-throughput sequencing technologies, the ever-growing availability of high-quality genotypes for a multitude of species has enabled to
study the genetic architecture of many complex traits using GWAS. More than 6,000 GWAS studies over the last decade have revealed substantial genotype-phenotype associations
for many traits not only in human but also in a wide diversity of plants and animals. Albeit valuable efforts have been made in integrating GWAS associations for human
(GWAS Catalog, GWASdb, etc.) and
(AraGWAS Catalog), a full collection of GWAS knowledge particularly for non-human species is scarce but highly needed for molecular
breeding and improvement of agronomic traits.
The ever-growing availability of high-quality genotypes for a multitude of species has enabled many variant-trait associations have been published. Albeit valuable efforts have
been made in integrating GWAS associations for human (GWAS Catalog, GWASdb, etc.) and A. thaliana (AraGWAS Catalog), a full collection of GWAS knowledge particularly for non-human
species is scarce but highly needed for molecular breeding and improvement of agronomic traits. Here, we present the
GWAS Atlas, a manually curated resource
of published genome-wide variant-trait associations for plants and animals.
The current release of GWAS Atlas features a comprehensive collection 75467 curated genotype-to-phenotype (G2P) associations for 614 traits across 7 plants and 2 animals, which were manually curated from 254 publications. All integrated studies and associations can be accessed via a tabular web-interface and as downloadable tab-delimited files. More
importantly, all associations and traits were annotated and organized based on a suite of reference ontologies (PTO,
Plant Trait Ontology;
ATOL, Animal Trait Ontology for Livestock),
species-specific ontology (CO, Crop Ontology)
and our customized ontologies (PPTO, Plant Phenotype and Trait Ontology; APTO, Animal Phenotype and Trait Ontology) which can be accessed
via ontology-based browse. Besides, by analyzing all integrated associations for each species, we defined the top associated genes with multiple associations and the highly associated
variants to multiple traits. Ongoing and future developments include incorporation of association data from more species and comparative analysis among different species.
2. How are G2P associations curated in GWAS Atlas?
2.1. Overview of GWAS Atlas Curation Processes
Figure 1 Overview of GWAS Atlas Curation Processes
2.2. Literature retrieval
We perform literature search in PubMed using species name and GWAS as keywords and accordingly obtain a total of 1850 publications. Among them, 1767 publications published after 2009 are
retained. Publications are eligible for inclusion in GWAS Atlas if they contain both necessary description on biological traits and significant GWAS associations.
2.3. The GWAS data extraction
We manually curated the genotype-to-phenotype (G2P) associations and extracted the publication, study, and variant-trait association information from each published paper.
For publications, we included literature title, published year, journal (web of science format), species, total number of associations, PMID, and citations from Europe PMC.
For the studies, we included species, sampling spot, sampling year, condition, population, sample size, tissue, genotyping technology, association model, association number, and PMID. The curation model for genotyping technology and GWAS association model we used were shown in Table 1 and Table 2, respectively.
Table 1: The curation model for genotyping technology
||Whole Genome Sequencing
||Genotyping by Sequencing
||Genotyping by Array
||Specific-Locus Amplified Fragment Sequencing
||Whole Exome Sequencing
Table 2: The curation model for GWAS association model
For the variant-trait associations, we integrated species, genome version, variant ID, genomic position, traits, GWAS association P-value, R2, and mapped genes.
||Mixed Linear Model
||General Linear Model
||Logistic Regression Model
||Compressed Mixed Linear Model
||Unified Mixed Linear Model
||Efficient Mixed Model
||Multi-Locus Mixed Model
||Bayesian Sparse Linear Mixed Model
||Factored Spectrally Transformed Linear Mixed Model
||Fixed and random model Circulating Probability Unification
2.4. Variation unifying and annotation
As the genome sequence is continuously updating, we unified the genomic position of variants which were collected from different publications to the latest version
of the reference genome in GVM using sequence-based searching. If there are variant records in the GVM
database, we use the reference identifier in
VarID and redirect the user to the variation view in GVM.
All variants were annotated by VEP.
2.5. Trait term annotation
We download the Planteome reference ontology (Plant Trait Ontology, PTO) developed by the Planteome project, and species-specific ontology (Crop Ontology, CO)
developed by various plant breeding and research communities from http://browser.planteome.org/amigo.
PTO is a controlled vocabulary to describe phenotypic traits in plants, and each trait is a distinguishable feature, characteristic, quality or phenotypic feature of a developing or mature plant,
or a plant part. CO is a collection of species- or clade-specific application ontology,
maize (Zea mays),
rice (Oryza sativa),
Soybean (Glycine max) and
Sorghum (Sorghum bicolor)
CO were used in GWAS Atlas. These reference ontologies serve as common standards for semantic comparison. For animals, the livestock ontology (Animal Trait Ontology for Livestock, Atol)
was downloaded from http://www.atol-ontology.com/en/erter-2/.
We performed interactively search for all curated plant phenotype terms using the 'term search' in Planteome
Given a partial term name or synonym. For example, a search for "pollen" will return multiple terms and/or synonyms with "pollen" in the name (Make an HTTP GET request to a URI with the
following form: http://browser.planteome.org/api/search/ontology?q=pollen).
Those trait terms that were perfectly matched to the 'Ontology Term' of reference ontologies, 'Synonym', or 'Definition' were annotated.
For those unannotated phenotype terms, we customized Plant Phenotype and Trait Ontology (PPTO) and Animal Phenotype and Trait Ontology (APTO) based on PTO rules, and constructed a controlled
vocabulary to describe these phenotypic traits. To leverage effectively and efficiently, we assign a unique identifier 'PPTO:XXXXXX' or 'APTO:XXXXXX' for plant or animal traits. The PPTO
encompasses nine broad, upper-level categories of plant traits as PTO (see Figure 2). In the trait information, we recorded the ID, Trait name, Synonymous, Definition, Comment,
SubClassOf information (Figure 3).
The PPTO, PTO and CO are displayed using a tree plugin (JsTree), user could browse or search easily by ontology term name.
Figure 2 The major nine categories of plant traits
Figure 3 A snap of trait information
3. How do I search the associations in GWAS Atlas?
In the 'Search' module, we support user to query the GWAS Atlas data by term keywords (e.g. height), gene ID (; Zm00001d021954), and genomic position (chr1:14702150-37601000).
The number of related traits, genes, and variants, and all eligible search results will be listed.
For example, by specifying a keyword 'seed' (Figure 4), there are 24 traits and 30 genes including 'seed' or related to 'seed'. The general information for each retrieved item was shown in the
right column (Figure 5). The filter options were provided in the left bottom to narrow down the query results. For a specific trait, the detailed trait information, associations,
studies and publications were provided (Figure 6).
Figure 4 Screenshots for Search
Figure 5 Search result
Figure 6 Detail information
4. How do I browse the associations?
User could browse the association data on 'Browse' view. All associations were mapped and annotated to three plant ontologies (PTO, CO, PPTO)
and two animal ontologies (ATOL, APTO)(Figure 7), which are grouped by domain to ease finding relevant ontologies. The ontologies system allows
user to browse, search and visualize the content of associations for each species. By selecting specific ontology and trait
term, the corresponding trait information, associations, studies and publications were displayed in the right column (Figure 8). Uses could also enter
a keyword, those relevant terms will match automatically for user to select.
Figure 7 Reference ontology
Figure 8 Trait, association, studies, and publication information for each term
5. How can I get GWAS Atlas data?
If you are interested in the associations for a species of interest, please visit the 'Download' page, which includes associations for each species in .xlsx and txt format. Our customized ontologies including PPTO and APTO were also available in obo format.
6.1. Funding Support
- Strategic Priority Research Program of Chinese Academy of Sciences (XDA08020102)
- The 13th Five-year Informatization Plan of Chinese Academy of Sciences (XXH13505-05)
- The Youth Innovation Promotion Association of Chinese Academy of Sciences (2018134)
6.2. Comments & Collaborations
We look forward to worldwide comments, suggestions and guidance from colleagues and peers with common research interests. We also invite the scientific community to submit their analysis results of GWAS to GWAS Atlas and to build collaborations in improving the functionalities of GWAS Atlas.
We would love to hear from you for any questions or comments. Please find our contact information here.
+86 (10) 8409-7620
+86 (10) 8409-7298
The GWAS Team, National Genomics Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS)
NO 1 Beichen West Road, Chaoyang District, Beijing 100101, China
6.4. Documentation in Chinese
The Documentation in Chinese can be downloaded here (pdf format).