MethBank is a database that integrates genome-wide DNA methylomes across a variety of species and provides an interactive browser for visualization of high-resolution DNA methylation data. Here we present an updated implementation of MethBank (http://bigd.big.ac.cn/methbank; Version 3.0) by incorporating high-quality DNA methylome maps for humans as well as multiple animals and plants, with significant improvements and advances over the previous version.
Yes. It is available at http://www.dnamethylome.org
Yes. MethBank provides downloads of all annotated data via ftp://download.big.ac.cn/methbank
4,577 samples from 24 Illumina 450K BeadChip array datasets are downloaded from GEO. All the data are from peripheral blood of healthy people. The GEO accessions are listed below and sample IDs could be accessed via ftp://download.big.ac.cn/methbank/Human/SamplesID.csv.
Methbank uses a model-based intra-array normalization strategy for the 450K platform, called BMIQ (Beta MIxture Quantile dilation) to adjust beta-values of type II probes into a statistical distribution characteristic of type I probes.
Yes. It is assumed the age-related CpG sites in different batches should have similar DNA methylation level at same age. So Methbank groups samples at same age into one group, and removes batch effects in each group separately using L/S batch adjustment.
Methylation sites closely associated with age are defined as the CpGs with Spearman correlation coefficient ≥ 0.6.
R (v3.3.2), package minfi 1.22.1 and Perl (v5.22.1).
To identify age-specific DMCs, two linear models per CpG site are applied. One model is with a fixed effect for age and a random effect for gender, while the other one is without the fixed effect for age. It is defined as an age-specific DMC if the model with the fixed effect (age) fitted the data better with the Bonferroni corrected p value ≤ 1.03E-7 (0.05/485K CpGs, F-test) as well as the effect size ≥ 20%. Using both an effect size and the p value cut-off, CpG sites are classified as age-specific DMCs or non-age-specific DMCs and multiple comparisons are then performed to determine which age group each age-specific DMC belongs to. At the same time, methylation sites with constant methylation levels across different ages are defined as the CpG sites with no significant difference between any two age groups on the mean methylation and its methylation range of 12 age groups is ≤ 0.1 simultaneously. An age-specific DMR is defined as a region covering at least 3 age-specific DMCs with an inter-CpG distance ≤ 1000 bp.
All the gene annotation of array data are based on the “HumanMethylation450 v1.1 Manifest File (CSV
downloaded from the following Illumina official website and the reference version is hg19:
MethBank-plant focuses on whole-genome single-base resolution methylation maps and all raw sequencing data used in MethBank are obtained from SRA and GSA (http://gsa.big.ac.cn) published before March 2017.
Remove the adapter sequences of the reads and discard any base whose quality are lower than 20 in the read ends.
WBSA-1.0, BWA (bwa-0.7.10), Samtools (v1.0), R (v2.14.2), Circos (circos-0.69-2), Perl (v5.22.1)
mm10 (Mus musculus)
Zv9 (Danio rerio)
Gmax_275_v2.0 (Glycine max)
GCF_000188115.3_SL2.50 (Solanum lycopersicum)
Mesculenta_305_v6 (Manihot esculenta)
Pvulgaris_218_v1 (Phaseolus vulgaris)
DMP is differentially methylated promoter between two samples. To identify DMP, firstly a fisher’s exact test is performed on the condition that the delta methylation levels of the promoters between two samples are greater than 0.1 for C/CH (H = A, C or T) and 0.2 for CG. For this test, a contingency table is constructed where each row indicates a particular sample and the columns indicates the sum of number of reads that supports a methylated cytosine or an unmethylated cytosine over all the cytosines at this promoter in a given sample. Secondly, FDR correction (< 0.01) for the p-values of fisher’s exact test is used. Finally, the promoter methylation of gene associated with DMP is provided. For plants, DMP is identified in strings of C, CG and CH (H = A, C or T). For animals, only CG sequence context is considered.
E7.5 embryo (Mus musculus), testicle (Danio rerio), cotyledon (Glycine max), leaf (Oryza sativa, Solanum lycopersicum, Manihot esculenta, Phaseolus vulgaris)
To support information search and exploration, MethBank provides friendly web interfaces to retrieve diverse information for a specific gene or region. By specifying a gene symbol, users can obtain its methylation states at promoter and gene body, as well as its basic information, gene expression, etc. It should be noted that the promoter region is defined as 2000 bp upstream of gene body for animals and 1500 bp upstream of gene body for plants.
On the search interface, user can learn (i) average methylation levels of promoter and genebody for concerned gene; (ii) promoter methylation levels of genes associated with DMPs between different development stages or tissues; (iii) the catalog of genes related to methylated CpG islands.
IDMP (Identification of Differentially Methylated Promoter) is a tool developed for identifying differentially methylated promoters (DMP) between any two samples. The identification procedure is detailed below. First, a Fisher’s exact test is performed on the condition that the delta methylation levels of the promoters between two samples are greater than a specified threshold. For this test, a contingency table is constructed where the row indicates a particular sample and the column indicates the sum of number of reads that supports a methylated cytosine or an unmethylated cytosine over all the cytosines at this promoter in a given sample. Second, the Benjamini-Hochberg False Discovery Rate (FDR) correction for the p-values of Fisher’s exact test is used. Finally, the promoter methylation of gene associated with DMP is provided. Users can directly download IDMP from the home webpage of MethBank and identify DMPs by providing two genome methylation files (BED format) of interested samples and the gene annotation file (GFF3 format) and setting the parameters (which include cytosine sequence context (C, CG or CH), the relative start position of promoters to TSS, delta methylation level, and p-value, etc).
Age Predictor is a tool to predict human DNA methylation age. Based on large-scale human methylation datasets integrated in MethBank 3.0, the age-related CpG sites with linear DNA methylation changes during aging are identified by Spearman correlation (|r| > 0.6). As a result, 52 age-related CpG sites (shown below) are selected in terms of their correlation and further employed with three machine learning models (Random Forest, SVM, and Elastic Net) to predict human DNA methylation age. Technically, the random forest algorithm is implemented by the randomForest (version 4.6-12) R package (1), where the parameter settings are ntree = 500 and mtry = 17. The SVM algorithm is implemented by the e1071 (version 1.6-7) R package with a radial basis function kernel (2), where the parameter settings are gamma = 0.0192 and cost = 1. For the elastic net, the glmnet function is used in glmnet (version 2.0-10) R package(3), where the parameters are optimized by tenfold cross-validation using a grid search and the best performance is obtained when setting alpha = 0.5 and lambda = 0.08. Age Predictor has been integrated into MethBank as an online tool that features straightforward and user-friendly web interfaces and accepts various types of data (raw data, processed data, GEO sample ID) as input.
You can move across the genome by clicking and dragging your mouse inside the track window or by using the navigation tools or by pressing the left and right arrow keys on your keyboard.
Center the view at a point by clicking on either the track scale bar or overview bar, or by shift-clicking in the track area.
Zoom in and out by clicking zoom buttons in the navigation bar or by pressing the up and down arrow keys + shift. Select a region and zoom to it ("rubber-band" zoom) by clicking and dragging in the overview or track scale bar, or shift-clicking and dragging in the track area. Some tracks allow you to zoom into a feature as one of its right-click (CTRL+CLICK on MAC) drop down menu options.
You can combine information from separate tracks into a custom "combination" track. For example, you may want to create a Search Track for a given sequence motif, as above, but only display the instances that occur in genes displayed in the "Transcribed Features" track. From the File menu in the top menu bar, select "Add combination track". A new "empty" track is created. Click on the information labels of the tracks you wish to combine, drag them over to the information label of the new combination track, and release to combine (the label will appear red while you are dragging it, and turn green when it is in position over the combination track). After the different tracks have been added, a menu pops up to select details of the combination (intersection, union, etc.). For the above example, you would select "Intersection".