Baseline Data
Description Version Download (.txt & .RData) File Size
DNA methylation profiles of 31 organism parts v1.20190621 tissue_methylation_v1.zip 7.7 GB
DNA methylation profiles of of 25 brain parts v1.20190621 brain_methylation_v1.zip 2.77 GB
DNA methylation profiles 25 blood cell types v1.20190621 blood_methylation_v1.zip 4.86 GB
DNA methylation profiles of male and female in 24 tissues v1.20190621 sex_methylation_v1.zip 4.33 GB
DNA methylation changes with age v1.20190621 age_methylation_v1.zip 11.73 GB
DNA methylation profiles of 6 ancestry categories v1.20190621 ancestry_category_methylation_v1.zip 1.96 GB
DNA methylation changes with BMI v1.20190621 bmi_methylation_v1.zip 3.06 GB
DNA methylation profiles of 39 cancers v1.20190621 cancer_methylation_v1.zip 16.07 GB
DNA methylation profiles of 28 diseases v1.20190621 disease_methylation_v1.zip 20.11 GB
Gaussian Mixture Quantile Normalization (GMQN)

Script

https://github.com/MengweiLi-project/gmqn

Methods

To remove the batch effects and other unwanted noise, we develop Gaussian Mixture Quantile Normalization (GMQN), a reference based method that removes unwanted technical variations at signal intensity level. GMQN adjusts batch effects as well as bias associated with type II probe values in 450k and EPIC/850K studies. The principle behind this method is that the signal intensity of each channel displays a Gaussian mixture distribution. The first component is the background signal which has a mean slightly greater than 0. The second component is the signal from probes which have been hybridized to input DNA successfully. Variance of the second component is much larger than the first component because the degrees of hybridization are different among probes.

The object of GMQN is to rescale the signal intensity to make the two Gaussian component from different array have the same mean and variance. There are four steps to perform GMQN.

  1. Fitting of a two-state Gaussian mixture model to the median values of each type I probe signal intensity from a large single study (GEO project id: GSE105018). The mean and variance of two components are used as reference for rescaling type I probes.
  2. Fitting of a two-state Gaussian mixture model to the input type I probe signal intensity.
  3. For type I probes from each component of input data, transform their probabilities to quantiles using the inverse of the cumulative Gaussian distribution with mean and variance estimated from the corresponding reference component.
  4. Calculating beta value and normalizing type II probes on the basis of type I probes using BMIQ.