FAQ - MethBank

Introduction

MethBank is a comprehensive DNA methylation database. It integrates consensus reference methylomes (CRMs), whole genome single-base resolution methylomes (SRMs), DNA & RNA methylation Tools (MeTools) and knowledge of epigenome-wide association studies (EWAS), provides an interactive browser for visualization and develops multiple tools for analysis.

About visualization

To visualize high resolution DNA methylomes, an interactive and user-friendly methylome browser built on JBrowse (http://jbrowse.org; a fast, embeddable genome browser built completely with JavaScript and HTML5) is deployed in MethBank. For each species, the methylome browser includes a variety of data tracks and allows users to choose tracks of interest and to zoom and scroll any region along the genome. In addition, users can change to another species by clicking the name of species on the upper left corner of the JBrowse.

About analysis tool

MethBank-CRM provides a tool to predict methylation age of human, named Age Predictor. Based on large-scale human methylation datasets integrated in MethBank, the age-related CpG sites with linear DNA methylation changes during aging are identified by Spearman correlation (|r| > 0.6). As a result, 52 age-related CpG sites (shown below) are selected in terms of their correlation and further employed with three machine learning models (Random Forest, SVM, and Elastic Net) to predict human DNA methylation age. Technically, the random forest algorithm is implemented by the randomForest (version 4.6-12) R package, where the parameter settings are ntree = 500 and mtry = 17. The SVM algorithm is implemented by the e1071 (version 1.6-7) R package with a radial basis function kernel, where the parameter settings are gamma = 0.0192 and cost = 1. For the elastic net, the glmnet function is used in glmnet (version 2.0-10) R package, where the parameters are optimized by tenfold cross-validation using a grid search and the best performance is obtained when setting alpha = 0.5 and lambda = 0.08. Age Predictor has been integrated into MethBank as an online tool that features straightforward and user-friendly web interfaces and accepts various types of data (raw data, processed data, GEO sample ID) as input.

The input page The output page

MethBank-SRM presents IDMP (Identification of Differentially Methylated Promoter), a tool developed for identifying differentially methylated promoters (DMP) between any two samples. The identification procedure is detailed below. First, a Fisher’s exact test is performed on the condition that the delta methylation levels of the promoters between two samples are greater than a specified threshold. For this test, a contingency table is constructed where the row indicates a particular sample and the column indicates the sum of number of reads that supports a methylated cytosine or an unmethylated cytosine over all the cytosines at this promoter in a given sample. Second, the Benjamini-Hochberg False Discovery Rate (FDR) correction for the p-values of Fisher’s exact test is used. Finally, the promoter methylation of gene associated with DMP is provided. Users can directly download IDMP from the home webpage of MethBank and identify DMPs by providing two genome methylation files (BED format) of interested samples and the gene annotation file (GFF3 format) and setting the parameters (which include cytosine sequence context (C, CG or CH), the relative start position of promoters to TSS, delta methylation level, and p-value, etc).

MethBank-CRM (Consensus Reference Methylome) module

  • 450K data is download from GEO and TCGA.
  • Datasets used in MethBank-CRM from NCBI include GSE73549 and GSE112047 for prostate, GSE90124 for skin, GSE111223, GSE99029 and GSE92767 for saliva, GSE32148, GSE40279, GSE50660, GSE51032, GSE51388, GSE52113, GSE53128, GSE53740, GSE59509, GSE61151, GSE61496, GSE64495, GSE65638, GSE67751, GSE72773, GSE72775, GSE72777, GSE73103, GSE79056, GSE80283, GSE80310, GSE83334, GSE87571, GSE89093 for peripheral blood.
  • Reference genome is hg38.
  • Data processing includes correct probe design bias, remove sample with outlier, remove the batch effects, and so on. The pipeline shows as the figure.
Key steps:
  • Correct probe design bias
  • Remove sample with outlier
  • Remove batch effect
  • Construct reference methylomes
  • Annotation and analysis

MethBank-SRM (Single-base resolution methylome) module

  • Whole-genome bisulfite sequencing data is download from SRA and GSA.
  • assembly versions were used for all species
    • Hg38 (Homo sapiens)
    • mm10 (Mus musculus)
    • Zv9 (Danio rerio)
    • RGSP-1.0(Oryza sativa)
    • Gmax_275_v2.0 (Glycine max)
    • GCF_000188115.3_SL2.50 (Solanum lycopersicum)
    • Mesculenta_305_v6 (Manihot esculenta)
    • Pvulgaris_218_v1 (Phaseolus vulgaris)
  • The pipeline of data processing Key steps: a)Data filter: remove low quality data and adaptor sequence; b)Sequencing data conversion; c)Reference conversion; d)Align converted sequencing data to converted reference; e) Remove unmapped reads, multiple mapped reads, and duplicate reads; f)Remove samples with low coverage data; g)Remove samples with low conversion rates; h)Identify cytosine methylation levers in difference context; i)Annotation and analysis.

MethBank-EWAS module

Epigenome-Wide Association Study (EWAS) has become increasingly significant in identifying the associations between epigenetic variations and different biological traits. In this study, we develop EWAS Atlas, a curated knowledgebase of EWAS that provides a comprehensive collection of EWAS knowledge. Unlike extant data-oriented epigenetic resources, EWAS Atlas features manual curation of EWAS knowledge from extensive publications. In the current implementation, EWAS Atlas focuses on DNA methylation—one of the key epigenetic marks; it integrates a large number of high-quality EWAS associations. In addition, it is equipped with a powerful trait enrichment analysis tool, which is capable of profiling trait-trait and trait-epigenome relationships.
View more

MethBank-MeTool (DNA & RNA Methylation Tools) module

We created MethBank-MeTool to catalogue and curated analysis tools for DNA and RNA methylation. MethBank-MeTool collects a range of information on each tool and categorizes them according to the platforms, libraries, applications and functions. MethBank-MeTool supports keyword search and provides dynamic update for the citation of all tools.