MethBank is a database that integrates genome-wide DNA methylomes across a variety of species and provides an interactive browser for visualization of high-resolution DNA methylation data. Here we present an updated implementation of MethBank (http://bigd.big.ac.cn/methbank; Version 3.0) by incorporating high-quality DNA methylome maps for humans as well as multiple animals and plants, with significant improvements and advances over the previous version.
Yes. It is available at http://www.dnamethylome.org
Yes. MethBank provides downloads of all annotated data via ftp://download.big.ac.cn/methbank
Yes. To visualize high resolution DNA methylomes, an interactive and user-friendly methylome browser built on JBrowse is deployed in MethBank. For each species, the methylome browser includes a variety of data tracks and allows users to choose tracks of interest and to zoom and scroll any region along the genome. In addition, users can change to another species by clicking the name of species on the upper left corner of the Jbrowse.
4,577 samples from 24 Illumina 450K BeadChip array datasets were downloaded from GEO. All the data are from peripheral blood of healthy people. The GEO accessions are listed below and sample IDs could be accessed via ftp://download.big.ac.cn/methbank/Human/SamplesID.csv.
Methbank used a model-based intra-array normalization strategy for the 450K platform, called BMIQ (Beta MIxture Quantile dilation) to adjust beta-values of type II probes into a statistical distribution characteristic of type I probes.
Yes. It is assumed the age-related CpG sites in different batches should have similar DNA methylation level at same age. So Methbank grouped samples at same age into one group, and removed batch effects in each group separately using L/S batch adjustment.
Methylation sites closely associated with age are defined as the CpGs with Spearman correlation coefficient ≥ 0.6.
R (v3.3.2), package minfi 1.22.1 and Perl (v5.22.1).
To identify aDMCs, two linear models per CpG site were applied. It was considered as a aDMC if the model with the fixed effect (age group) fitted the data better with the F test and the bonferroni corrected p value ≤ 1.03E-7 (0.05/485K CpGs) as well as the effect size ≥ 20%. Using both an effect size and the p value cut-off, CpG sites were classified as aDMCs or non-aDMCs. Methbank then performed scheffe test to identify which age group is different to others on each aDMC. At the same time, methylation sites with constant methylation levels across different ages were defined as the CpG sites with no significant difference between any two age groups on the mean methylation and the range is no greater than 0.1 simultaneously. aDMRs were defined as a region covering at least 3 aDMCs with an inter-CpG distance no more than 1000 bps.
All the gene annotation of array data are based on the “HumanMethylation450 v1.1 Manifest File (CSV
downloaded from the following Illumina official website and the reference version is hg19:
R: MethBank-plant focuses on whole-genome single-base resolution methylation maps and all raw sequencing data used in MethBank are obtained from SRA and GSA (http://gsa.big.ac.cn) published before March 2017.
Remove the adapter sequences of the reads and discard any base whose quality were lower than 20 in the read ends.
WBSA-1.0, BWA (bwa-0.7.10), Samtools (v1.0), R (v2.14.2), Circos (circos-0.69-2), Perl (v5.22.1)
mm10 (Mus musculus)
Zv9 (Danio rerio)
Gmax_275_v2.0 (Glycine max)
GCF_000188115.3_SL2.50 (Solanum lycopersicum)
Mesculenta_305_v6 (Manihot esculenta)
Pvulgaris_218_v1 (Phaseolus vulgaris)
DMP is differentially methylated promoter between two samples. To identify DMP, firstly a fisher’s exact test was performed for each promoter region where the average methylation level of one sample is greater, as least 0.1 for C or CH context/0.2 for CG context (delta methylation level), than that of the other sample. For this test, a contingency table was constructed where each row indicated a particular sample and the columns indicated the sum of number of reads that supported a methylated cytosine or an unmethylated cytosine over all the cytosines at this promoter in a given sample. Secondly, FDR correction (< 0.01) for the p-values of fisher’s exact test was used. Finally, the promoter methylation of gene associated with DMP was provided. For plants, DMP was identified in strings of C, CG and CH (H = A, C or T). For animals, only CG sequence context was considered.
E7.5 embryo (Mus musculus), testicle (Danio rerio), cotyledon (Glycine max), leaf (Oryza sativa, Solanum lycopersicum, Manihot esculenta, Phaseolus vulgaris)
To support information search and exploration, MethBank provides friendly web interfaces to retrieve diverse information for a specific gene or region. By specifying a gene symbol, users can obtain its methylation states at promoter and gene body, as well as its basic information, gene expression, etc. It should be noted that the promoter region is defined as 2000 bps upstream of gene body for animals and 1500 bps upstream of gene body for plants.
On the search interface, user can learn (i) average methylation levels of promoter and genebody for concerned gene; (ii) promoter methylation levels of genes associated with DMPs between different development stages or tissues; (iii) the catalog of genes related to methylated CpG islands.