Skip to main content

Elevated polygenic burden for autism is associated with differential DNA methylation at birth



Autism spectrum disorder (ASD) is a severe neurodevelopmental disorder characterized by deficits in social communication and restricted, repetitive behaviors, interests, or activities. The etiology of ASD involves both inherited and environmental risk factors, with epigenetic processes hypothesized as one mechanism by which both genetic and non-genetic variation influence gene regulation and pathogenesis. The aim of this study was to identify DNA methylation biomarkers of ASD detectable at birth.


We quantified neonatal methylomic variation in 1263 infants—of whom ~ 50% went on to subsequently develop ASD—using DNA isolated from archived blood spots taken shortly after birth. We used matched genotype data from the same individuals to examine the molecular consequences of ASD-associated genetic risk variants, identifying methylomic variation associated with elevated polygenic burden for ASD. In addition, we performed DNA methylation quantitative trait loci (mQTL) mapping to prioritize target genes from ASD GWAS findings.


We identified robust epigenetic signatures of gestational age and prenatal tobacco exposure, confirming the utility of DNA methylation data generated from neonatal blood spots. Although we did not identify specific loci showing robust differences in neonatal DNA methylation associated with later ASD, there was a significant association between increased polygenic burden for autism and methylomic variation at specific loci. Each unit of elevated ASD polygenic risk score was associated with a mean increase in DNA methylation of − 0.14% at two CpG sites located proximal to a robust GWAS signal for ASD on chromosome 8.


This study is the largest analysis of DNA methylation in ASD undertaken and the first to integrate genetic and epigenetic variation at birth. We demonstrate the utility of using a polygenic risk score to identify molecular variation associated with disease, and of using mQTL to refine the functional and regulatory variation associated with ASD risk variants.


Autism spectrum disorder (ASD) defines a group of complex neurodevelopmental disorders marked by deficits in social communication and restricted, repetitive behaviors, interests, or activities [1]. ASD affects ~ 1–2% of the population, and confers severe lifelong disability [2,3,4]. Quantitative genetic studies indicate that ASD is highly heritable [5, 6], although population-based epidemiologic studies of environmental risks and ASD liability modeling using family designs also indicate environmental factors as important [7]. Genetic studies have shown that autism risk is strongly associated with both rare inherited and de novo DNA sequence variants [8,9,10,11]. In contrast, the identification of common genetic variants associated with ASD using genome-wide association studies (GWAS) has proven harder than for other complex neuropsychiatric traits such as schizophrenia [12], at least in part due to a lack of large sample datasets. Recent collaboration between the Psychiatrics Genomics Consortium autism workgroup (PGC-AUT) and the Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH) has greatly expanded the number of ASD cases with GWAS data, enabling the identification of three genome-wide significant associations for ASD and evidence for a substantial polygenic component in signals falling below the stringent genome-wide significance threshold [13]. None of the three ASD-associated loci are predicted to result in coding changes or altered protein structure; instead they are hypothesized to influence gene regulation. Previous studies of other neurodevelopmental disorders have reported an enrichment of disease-associated variation in regulatory domains, including enhancers and regions of open chromatin [14].

Epigenetic variation induced by non-genetic exposures has been hypothesized to be one mechanism by which environmental factors can affect risk for ASD [15, 16]. Recent studies have provided initial evidence for autism-associated epigenetic variation in both brain and peripheral tissues [17,18,19,20,21,22], although these analyses have been undertaken on relatively small numbers of samples with limited statistical power. Existing analyses have assessed epigenetic variation in samples collected after a diagnosis of ASD has been assigned and are likely to be confounded by factors such as smoking [23,24,25], medication [26, 27], other environmental toxins [28], and reverse causation [29]. Furthermore, they have not investigated the role of genetic variation in mediating associations between epigenetic variation and ASD. The integration of genetic and epigenetic data will facilitate a better understanding of the molecular mechanisms involved in autism, especially given the high heritability of ASD and recent data showing how the epigenome can be directly influenced by genetic variation [30,31,32,33]. For example, we have previously demonstrated the potential for using polygenic risk scores (PRS)—defined as the sum of trait-associated alleles across many genetic loci, weighted by GWAS effect sizes—as disease biomarkers with utility for exploring the molecular genomic mechanisms involved in disease pathogenesis [34]. Of note, PRS-associated epigenetic variation is potentially less affected by factors associated with the disease itself, which can confound case–control analyses.

In this study, we quantified DNA methylation for ~ 1316 individuals (comprising equal numbers of ASD cases and matched controls, 50% male/female) using DNA samples isolated from neonatal blood spots collected proximal to birth (mean = 6.08 days; standard deviation (sd) = 3.24 days; Additional file 1: Figure S1). Known epigenetic signatures for gestational and chronological age [35, 36], and exposure to maternal smoking during pregnancy [24], were used to confirm the robust nature of genome-wide DNA methylation data generated from neonatal blood spots. Matched genome-wide single nucleotide polymorphism (SNP) genotyping data from the same individuals enabled us to undertake an integrated genetic–epigenetic analysis of ASD, exploring the extent to which neonatal methylomic variation at birth is associated with elevated polygenic burden for ASD. Finally, we generated an extensive database of DNA methylation quantitative trait loci (mQTL) in neonatal blood samples, which were used to characterize the molecular consequences of genetic variants associated with ASD.


Overview of the MINERvA cohort

Denmark has a comprehensive neonatal screening program which is used to test for innate errors of metabolism, hypothyroidism, and other treatable disorders. Neonatal blood is collected on standard Guthrie cards and residual material is stored within the Danish Neonatal Screening Biobank. The reason for storing the samples in prioritized order is: (1) diagnosis and treatment of congenital disorders, (2) diagnostic use later in infancy after informed consent, (3) legal use after court order, (4) research projects pending approval by the Scientific Ethical Committee System in Denmark, The Danish Data Protection Agency, and the NBS-Biobank Steering Committee. Thus, research is possible assuming sufficient material remains for the proceeding priorities [37]. Cases and controls were selected from the iPSYCH case–control sample, which has been recently described [38]. Briefly, the iPSYCH study population comprises all singletons born in Denmark between May 1st 1981 and December 31st 2005, who are still alive and residing in Denmark at their first birthday and with a known mother. iPSYCH ASD cases comprise all children in the study population with an ASD diagnosis reported before December 31st 2012. iPSYCH controls comprise 30,000 persons randomly selected from the study population (about 2% of the total study population).

The MINERvA study profiled a subsample of 1316 iPSYCH samples, including an equal number of ASD cases and controls that were selected using the following criteria. Cases were born between 1998 and 2002, with both parents born in Denmark themselves. We selected a 1:1 male to female ratio (i.e., by “oversampling” ASD females). Cases and controls were excluded if they had a reported diagnosis (before December 31st 2012) of select known genetic disorders: Down syndrome, Fragile X, Angelman, Prader Willi, Zellweger, William, tuberous sclerosis, Rett, Tourette, neurofibromatosis, Duchennes, Cornelia de Lange, DiGeorge, Smith-Lemli-Opitz, Klinfelter. In addition, controls were excluded if they had died or emigrated from Denmark before December 31st 2012, or had any reported psychiatric diagnosis. Eligible controls were individually matched to cases on sex, month of birth (month before, same month, or month after case month), and year of birth. Among the controls fulfilling these criteria, additional matching criteria were applied as closely as possible with regard to gestational age (in weeks) and the same urbanicity level of maternal residence at time of birth as cases. All perinatal data used for case–control matching, plus additional information on birth weight and maternal smoking were obtained from the Danish Medical Birth Register or the Central Person Register. Detailed maternal smoking data were used to generate a binary variable indicating whether the mother smoked during pregnancy or not. All diagnoses used for ASD case identification and case/control exclusions were obtained from the Danish Psychiatric Central Research Register (DPCRR) and Danish National Patient Register (DNPR). In Denmark, children and adolescents suspected of ASD or other mental or behavioral disorders are referred by general practitioners or school psychologists to a child and adolescent psychiatric department for a multidisciplinary evaluation, and their conditions are diagnosed by a child and adolescent psychiatrist. Registry reporting is done only by psychiatrists following mandatory training in the use of the World Health Organization International Classification of Diseases (ICD) [39]. The following ICD-10 diagnosis codes were used: ASD, F84.0, F84.1, F84.5, F84.8, F84.9; any psychiatric disorder, F00–F99. Reported diagnoses for the conditions used as exclusions were obtained from the DNPR, which holds all data on in- and out-patient diagnoses given at discharge from somatic wards in all hospitals and clinics since 1995 [40]. Additional file 2: Table S1 gives a full overview of relevant diagnosis codes. The MINERvA study was approved by the Regional Scientific Ethics Committee in Denmark and the Danish Data Protection Agency.

DNA methylation profiling in MINERvA

Neonatal dried blood spot samples were retrieved from the Danish Neonatal Screening Biobank, within the Danish National Biobank, as part of the iPSYCH study. Neonatal DNA extractions and DNA methylation quantification were performed at the Statens Serum Institut (SSI, Copenhagen, Capital Region, Denmark), building on a previously described protocol [41]. Briefly, from each dried blood spot sample two disks of 3.2 mm were used with the Extract-N-Amp Blood PCR kit (Sigma-Aldrich, St. Louis, USA) and eluted in 200 μL buffer. The isolated genomic DNA (160 μL) was converted with sodium bisulfite using the EZ-96 DNA Methylation Kit (Zymo Research, California, USA). DNA methylation was quantified across the genome using the Infinium HumanMethylation450k array (“450 K array”; Illumina, California, USA) and a modified protocol as previously described [40]. Fully methylated and unmethylated control samples were included on each plate throughout each stage of processing.

MINERvA Illumina 450 K array data pre-processing and quality control

Signal intensities for 1316 neonatal blood samples, 14 fully methylated control samples, and 14 fully unmethylated control samples were imported into the R programming environment using the methylumIDAT() function in the methylumi package [42]. Our stringent quality control (QC) pipeline included the following steps: 1) checking methylated and unmethylated signal intensities and excluding samples where either the median methylated or unmethylated intensity values were < 2500; 2) using the ten control probes to ensure the sodium bisulfite conversion was successful, excluding any samples with a median score < 80; 3) identifying the fully methylated and fully unmethylated control samples were in the correct location on each plate; 4) using the 65 SNP genotyping probes on the array to confirm no duplicate samples; 5) multidimensional scaling of data from probes on the X and Y chromosomes separately to confirm reported gender; 6) comparing genotype data for up to 65 SNP probes on the 450 K array with SNP array data; 7) using the pfilter() function in wateRmelon [43] to exclude samples with more than 1% of probes characterized by a detection P value > 0.05, in addition to probes characterized by > 1% of samples having a detection P value > 0.05. In total, 1263 samples (96.0%) passed all QC steps and were included in subsequent analyses. Normalization of the DNA methylation data was performed used the dasen() function in the wateRmelon package [43].

SNP genotyping and derivation of ASD polygenic risk scores

DNA was extracted at SSI as above and whole genome amplified in triplicate using the REPLI-g kit (Qiagen, Hilten, Germany). The triplicates were pooled and then quantified using Quant-iT picogreen (Invitrogen, California, USA). Samples were genotyped at the Broad Institute (Boston, Massachusetts, USA) using the Infinium PsychChip v1.0 array (Illumina, San Diego, California, USA) using a standard protocol. Phasing and imputation was done using SHAPEIT [44] and IMPUTE2 with haplotypes from the 1000 Genomes Project, phase 3 [45, 46] as described previously [38]. ASD polygenic risk scores (PRSs) were generated as a weighted sum of associated variants as previously described [47]. Briefly, results from the largest autism GWAS available from a combined effort by the Psychiatric Genomics Consortium (PGC) and iPSYCH [13] was used to select genetic variants and provide weights. As the MINERvA cohort is a subset of the broader iPSYCH cohort we used GWAS results excluding MINERvA samples, so that there was no overlap between the training cohort and the test cohort. Ten different significance thresholds (pT) from 5 × 10−8 to 1 were used to select sets of genetic variants, which were linkage disequilibrium (LD) clumped using plink with setting –clump-p1 1 –clump-p2 1 –cump-r2 0.1 –clump-kb 500 to generate PRSs.

Statistical analysis

All statistical analyses were performed using the R statistical environment version 3.2.2 [48]. To test the validity and robustness of our blood spot DNA methylation measures, we implemented two DNA methylation clock algorithms to derive estimates for both age in years [36] and gestational age in weeks [35] for each sample. In addition, for each sample, we computed a score for prenatal exposure to maternal smoking using DNA methylation data as previously described by Elliott et al. [23]. To identify DNA methylation sites associated with ASD status in the MINERvA discovery dataset, a linear model was fitted for each DNA methylation site with DNA methylation as the dependent variable, case/control status as an independent variable, and a set of possible confounders as covariates—sex, experimental array number, urbanicity level, birth month, birth year, gestational age, smoking, and cell composition variables estimated using the Houseman algorithm with a reference dataset for whole blood [49, 50]. Regional analysis to identify differentially methylated regions (DMRs) spanning multiple DNA methylation sites was performed using a sliding-window approach as previously described [34]. Subsequent replication and meta-analysis was performed using summary statistics available from two US-based studies: the Study to Explore Early Development (SEED) [51] and the Simons Simplex Collection (SSC) [52]. Meta-analysis to combine the epigenome-wide association study (EWAS) results from MINERvA, SEED, and SSC studies was performed for DNA methylation loci present in at least two of the three studies. Data quality control, normalization, and ASD EWAS analysis was performed separately for each of the replication cohorts. A complete description of the SEED and SSC datasets can be found elsewhere [53]. The P values from the three independent EWAS analyses were combined using Fisher’s method, focusing on DMPs where the direction of effect was consistent across all studies. To identify DNA methylation sites associated with elevated autism polygenic risk burden, a linear model was used with DNA methylation as the dependent variable and ASD PRS, the number of non-missing genotypes contributing to the PRS, the first five genetic principal components, sex, experimental array number, six cell composition variables, smoking score, gestational age, and birth weight included as independent variables as described above. DNA methylation sites significantly associated with either ASD case control status or ASD PRS were identified at an experiment-wide significant threshold of P < 1 × 10−7, which is corrected for the number of DNA methylation sites profiled on the 450 K array.

DNA mQTL and co-localization analyses

All DNA methylation sites located within 250 kb of the three genome-wide significant genetic variants identified in the PGC-AUT GWAS [13] were identified and cis (defined as a 500-kb window) mQTL analysis was performed using the 1257 samples within MINERvA that had both DNA methylation and imputed genotype data. mQTL were identified using an additive linear model to test if the number of alleles (coded 0, 1, or 2) predicted DNA methylation at each site, including covariates for sex, and the first five principal components from the genotype data fitted using the MatrixEQTL package [54]. Co-localization analysis was performed for each DNA methylation site as previously described [55] using the R coloc package ( From both the PGC-AUT GWAS data and our mQTL results we inputted the regression coefficients, their variances and SNP minor allele frequencies, and the prior probabilities were left as their default values. This methodology quantifies the support across the results of each GWAS for five hypotheses by calculating the posterior probabilities, denoted as PPi for hypothesis Hi.

H 0 : there exist no causal variants for either trait;

H 1 : there exists a causal variant for one trait only, ASD;

H 2 : there exists a causal variant for one trait only, DNA methylation;

H 3 : there exist two distinct causal variants, one for each trait;

H 4 : there exists a single causal variant common to both traits.


Robust epigenetic signatures of gestational age and prenatal tobacco exposure validate DNA methylation data generated from neonatal blood spots

Following our stringent QC pipeline (see “Methods”) our final MINERvA DNA methylation dataset included 1263 samples comprising 629 ASD cases and 634 controls. The characteristics of this sample are displayed in Table 1; of note, due to oversampling female cases, we had a near equal ratio of males and females (632:631). There were no significant differences between ASD cases and controls for maternal or paternal age, days to blood spot sampling, or birth weight (P > 0.05). There was a significantly higher rate of maternal smoking for the ASD cases (P = 0.003) and evidence of higher smoking quantity (P = 0.006). We used DNA methylation data to derive estimates of gestational age [35] and chronological age [36] for each sample. The mean predicted gestational age was 37.7 weeks (sd = 1.35 weeks; Additional file 1: Figure S2) compared to the actual mean of 39.6 weeks (sd = 1.77 weeks), with a strong positive correlation between estimated and actual gestational age (r = 0.602; Fig. 1a). The mean predicted chronological age was 0.495 years (sd = 0.298; Additional file 1: Figure S3) and this was less strongly correlated with actual age (r = 0.139; Fig. 1b), consistent with data from Knight et al. [35]. Of note, “days to sampling”—i.e., the time between birth and blood draw—was not correlated with either predicted gestational age or chronological age, and controlling for this did not improve the strength of the correlation with gestational age (Additional file 1: Figure S4). We next tested robust markers of smoking exposure during pregnancy [24] and adulthood, using an established algorithm [23] to calculate a DNA methylation derived “smoking score” which we compared to reported in utero exposure. We identified a highly significant association between this smoking score and actual exposure, with offspring exposed to tobacco smoking in utero having higher smoking scores compared to offspring who were not exposed (P = 8.41 × 10−95; Fig. 1c) [23, 34]. Taken together these analyses highlight the utility of using DNA isolated from neonatal blood spots to generate reliable DNA methylation data that can robustly identify exposure/trait-associated variation.

Table 1 Characteristics of samples included in the MINERvA cohort
Fig. 1
figure 1

DNA methylation data from neonatal blood spots can be used to accurately predict age and maternal smoking status. a Scatterplot of gestational age predicted from DNA methylation data (using an algorithm generated by Knight et al. [35]) against actual gestational age. Autism cases are in red and controls are in green. b Scatterplot of chronological age predicted from DNA methylation data (using the online Epigenetic Clock software [36]) against actual gestational age. Autism cases are in red and controls are in green. c Boxplot of a smoking score derived from DNA methylation data [23] stratified by maternal smoking status during pregnancy

Methylomic variation in perinatal blood is not significantly associated with childhood autism

Our initial analysis focused on identifying neonatal blood DNA methylation differences among MINERvA neonates who went on to later develop a childhood diagnosis of ASD. No global differences in DNA methylation—estimated by averaging across all probes on the array included in our analysis—were identified between ASD patients (N = 629) and controls (N = 634) (ASD mean = 50.0%, ASD sd = 0.0811%; controls mean = 50.0%, controls sd = 0.0917%; t-test P = 0.695). Using a linear model to identify DNA methylation differences in ASD cases compared to controls we did not identify any differentially methylated positions (DMPs) passing an experiment-wide significance threshold adjusted for multiple testing (P < 1 × 10−7). Twenty ASD-associated DMPs were identified at a “discovery” threshold of P < 5 × 10−5 (Additional file 1: Figures S5 and S6; Additional file 2: Table S2); the most significant association was at cg12699865, which is located the 5′ UTR of RALY where the mean level of DNA methylation was 0.647% lower (P = 7.63 × 10−7) in ASD cases compared to controls (Additional file 1: Figure S7). Regional analysis combining the EWAS P values for DNA methylation sites within a sliding window across the genome (see “Methods”) did not identify any significant ASD-associated DMRs after correcting for multiple testing. Given the higher prevalence of ASD diagnosis in males, we also tested for an interaction between autism status and sex but identified no significant associations (P < 1 × 10−7) and only seven DMPs at our discovery threshold of P < 5 × 10−5 (Additional file 2: Table S3).

We next meta-analyzed these findings with summary statistics from 450K array measurements for two US-based studies of autism—the Study to Explore Early Development (SEED) [51] and Simons Simplex Collection (SSC) [52]. Although neither of these datasets was generated on blood samples collected immediately after birth, they enabled us to assess a combined sample size of 1425 ASD cases and 1492 controls (Additional file 2: Table S4). We first took the top ranked loci identified in each independent study and compared the directions of effect (i.e., difference between autism and controls); we did not find any excess of consistent associations (all sign test P > 0.05; Additional file 1: Figure S8; Additional file 2: Table S5). Second, we combined the P values from the EWAS results of the three samples using Fisher’s method (Fig. 2a; Additional file 1: Figure S9). There were no sites where the combined P value survived correction for multiple testing (P < 1 × 10−7), although 45 ASD-associated DMPs were identified at the discovery P value threshold (P < 5 × 10−5) (Additional file 2: Table S6). The most significant DNA methylation site, based on a consistent direction of effect across all three studies, was cg03618918 (combined P = 3.85 × 10−7; pooled mean = 1.17%; Fig. 2b), located ~ 10 kb from ITLN1. In general, the estimated effects of ASD-associated DMPs (P < 5 × 10−5) was very small (Additional file 1: Figure S10), typically ~ 1% difference between ASD and controls. Taken together, these data suggest that, based on the sites assayed by the 450K array, ASD is not associated with robust methylomic signatures in blood obtained during early childhood.

Fig. 2
figure 2

A cross-cohort meta-analysis finds little evidence of autism-associated methylomic variation in neonatal and childhood blood samples. a Manhattan plot of P values from the autism EWAS meta-analysis (total n = 2917). P values were calculated using Fisher’s method for combining P values; solid circles indicate sites where the direction of effect was consistent across all contributing cohorts, empty triangles indicate where there were different directions of effect in at least two studies. The red horizontal line indicates experiment-wide significance (P < 1 × 10−7). The blue horizontal line indicates a more relaxed "discovery" threshold (P < 1 × 10−5). b Forest plot of cg03618918, the most significant DNA methylation sites associated with ASD in the meta-analysis. The effect is the mean difference in DNA methylation between autism cases and controls. The sizes of the boxes are proportional to the sample size of that cohort

Increased polygenic burden for autism is associated with methylomic variation in blood at birth

Like many complex diseases, individual genetic variants associated with autism explain only a small proportion of an individual’s risk [6, 56]. Polygenic risk scores (PRSs), which essentially count the number of risk alleles across multiple associated loci, have been used successfully to capture the polygenic architecture of complex traits, including autism [47]. PRS have been used to establish genetic correlations between traits [6] and there has been recent interest in using PRS as a quantitative variable to identify molecular biomarkers of high genetic burden [34, 57, 58]. PRS-associated epigenetic variation is potentially less affected by non-genetic risk factors for the disease itself, which can confound case–control analyses, although pleiotropic effects of these genetic variants, which may themselves influence DNA methylation, cannot be excluded. We generated autism PRSs for individuals in the iPSYCH-MINERvA sample using recent results from a meta-analysis of samples in the PGC-AUT GWAS [13] excluding the subset of individuals included the MINERvA cohort (n = 45,162; 39.4% autism cases). Individual PRSs were calculated using a range of different GWAS P value thresholds (pT = 5 × 10−8, …, 1) to identify the optimal set of SNPs with the largest difference between ASD cases and controls in MINERvA. All scores based on P values < 1 significantly predicted autism status (P < 0.05; Additional file 2: Table S7; Additional file 1: Figure S11), with a PRS based on pT = 0.1 having the most significant difference (P = 9.49 × 10−13) between ASD cases and controls (Fig. 3a). There was a strong positive correlation between scores based on SNPs selected at relatively relaxed significance thresholds (i.e., pT > 0.001; Additional file 1: Figure S12), with weaker correlations between scores based on more limited (but more strongly associated) sets of variants, potentially reflecting the more dramatic effect a single SNP has on the PRS when the total number of SNPs is small. We next performed an EWAS of ASD PRS (Additional file 1: Figure S13; Additional file 1: Figure S14), observing strong correlations (r > 0.5) between the results of analyses of scores based on pT > 0.01 (Additional file 1: Figure S15). Examples of PRS-associated DMPs identified using the most predictive ASD PRS (pT < 0.1) are shown in Additional file 1: Figure S16; in total, we identified two DMPs significantly associated (P < 1 × 10−7) with elevated polygenic burden (cg02771117, P = 3.14 × 10−8; cg27411982, P = 8.38 × 10−8), with 49 DMPs associated at a more relaxed “discovery” P value threshold (P < 5 × 10−5) (Fig. 3; Additional file 3: Table S8). Both cg02771117 and cg27411982 are located on chromosome 8, but are ~ 5 kb apart and annotated to two different genes (FAM167A and RP1L1, respectively). Differential DNA methylation at these sites on chromosome 8 is identified in each of the eight most inclusive ASD PRS EWAS analyses (i.e., those using the most relaxed GWAS P value threshold; Additional file 1: Figure S14). Of note, both DMPs flank a significant genetic association signal identified in the latest ASD GWAS (Additional file 1: Figure S17). We used ChromHMM classifications [59, 60] based on regulatory data from the Roadmap Epigenomics Project ( [61] to characterize chromatin states across this region (Additional file 1: Figure S18). The index SNP for the GWAS signal is in a region predicted to be characterized by a repressed polycomb state in blood and a quiescent/low state in brain. One of the ASD PRS-associated DNA methylation sites (cg02771117) is located in a predicted enhancer region, and the other (cg27411982) is in a region of predicted quiescent/low chromatin state. To establish whether the PRS-associated methylation signal in this region reflected direct effects of the GWAS signal itself, we iteratively added PRS variants within 100 kb of these two sites as covariates in our EWAS in order of significance (see “Methods”). After the addition of the four most significant genetic variants, which were independently associated with cg02771117 (Additional file 1: Figure S19), the ASD PRS term was no longer significant (P = 0.0518; Additional file 2: Table S9). In contrast cg27411982 was still nominally significant even after the addition of 12 ASD-associated SNPs, four of which were independently associated and largely explained the association between the ASD PRS and DNA methylation (Additional file 1: Figure S20; Additional file 2: Table S10). These data suggest that the PRS-associated variation in DNA methylation at both cg02771117 and cg27411982 results from the combined effects of multiple genetic variants associated with ASD in this region. In order to demonstrate that the PRS EWAS results are not simply a consequence of the ASD cases within the full MINERvA sample, we repeated the analysis separately for cases and controls. P values from this approach were strongly correlated with those for the analysis across all samples (Additional file 1: Figure S21), indicating that the methylomic consequences of high genetic burden are largely consistent across both groups.

Fig. 3
figure 3

Polygenic burden for autism is associated with significant variation in DNA methylation at birth. a Density plot of polygenic risk score (PRS; pT = 0.01) split by ASD case control status. b Q-Q plots of the ASD PRS (pT = 0.01) EWAS analysis in neonatal blood DNA. c Manhattan plot of the ASD PRS (pT = 0.01) EWAS analysis in neonatal blood DNA. The red horizontal line indicates experiment-wide significance (P < 1 × 10−7); blue horizontal line indicates a “discovery” significance threshold (P < 5 × 10−5). Scatterplots of experiment-wide significant CpG sites where DNA methylation (y-axis) at d cg02771117 and e cg27411982 is correlated with ASD PRS (x-axis). Red points indicate ASD cases, green points indicate controls. f Scatterplots of –log10 P value from the EWAS of ASD PRS comparing the results from an analysis performed in all individuals (x-axis) against the results from an analysis performed separately for cases and controls and then combined with a meta-analysis (y-axis)

Alignment of DNA methylation quantitative trait loci and ASD genetic signals

None of the GWAS-AUT identified genetic variants tag known nonsynonymous mutations; consistent with other complex phenotypes it is likely that disease-associated variants instead influence the regulation of gene expression [14, 62]. Building on our previous work showing how DNA methylation quantitative trait loci (mQTLs) can be used to refine GWAS loci through the identification of discrete sites of variable methylation associated with disease risk variants [30, 34], we used the matched MINERvA DNA methylation and genetic data (see “Methods”) to identify mQTL located in the vicinity of ASD-associated GWAS variants (Fig. 4; Additional file 1: Figure S22). Simply aligning mQTL data with GWAS results is not sufficient to infer that there is a relationship between ASD and DNA methylation in these regions; instead it may reflect two distinct causal variants—one associated with ASD and the other with DNA methylation—in strong linkage disequilibrium. To establish whether there was evidence of a single causal variant influencing both DNA methylation and ASD in the regions nominated by the GWAS we performed a Bayesian co-localization analysis [55]. Briefly, this approach compares the pattern of association results from two independent GWAS (i.e., of ASD and DNA methylation) to see if associations colocalize to the same causal variant. We considered mQTL data for 457 unique DNA methylation sites located within 250 kb of three independent autosomal ASD GWAS variants. The posterior probabilities involving 91 of these sites were supportive of a co-localized association signal for both ASD and DNA methylation (PP3 + PP4 > 0.99; Additional file 3: Table S11). Four of these sites located on chromosome 20 had a higher posterior probability for both ASD and DNA methylation being associated with the same causal variant compared to them being associated with different causal variants (PP4/PP3 > 1; Additional file 1: Figure S23). The genes annotated to these sites (KIZ, XRN2, and NKX2–4) represent putative candidates for a potential functional role in ASD and warrant further investigation.

Fig. 4
figure 4

DNA methylation quantitative trait loci (mQTL) mapping can localize putative causal loci associated with ASD. Presented here is a genomic region (chr8:10268916–10,918,152) identified in a recent GWAS analysis of ASD [13]. At the top of the figure is a schematic detailing the genes located in this region which are identified by their Entrez ID number. All genetic variants identified in the ASD GWAS (P < 1 × 10−4) are represented by vertical solid lines where the color reflects the strength of the association ranging from gray (less significant P values) to black (more significant P values). A red vertical line indicates the most significant genetic variant in this region. All DNA methylation sites tested for neonatal blood mQTL in the MINERvA dataset are indicated by red vertical lines and genetic variants by blue vertical lines. Significant neonatal blood mQTLs (P < 1 × 10−13) are indicated by black diagonal lines between the respective genetic variant and DNA methylation site. Genomic locations are based on hg19. Additional examples of mQTLs in genomic regions showing genome-wide significant association with ASD are given in Additional file 1: Figure S22


In this study, we quantified neonatal methylomic variation in 1263 infants selected from the iPSYCH cohort [38] including samples from individuals who went on to develop ASD and carefully matched control samples. It represents the first attempt to integrate analyses of both genetic and epigenetic variation at birth in ASD, demonstrating the utility of using a polygenic risk score to identify molecular variation associated with disease, and of using DNA methylation quantitative trait loci to refine the functional and regulatory variation associated with ASD risk variants. While ASD itself was not associated with significant differences in neonatal DNA methylation, at an experiment-wide significance threshold, increased polygenic burden for autism was found to be associated with methylomic variation at specific loci in blood at birth. Our analysis of ASD PRS and DNA methylation supplements an increasing body of literature investigating the effects of high genetic burden for other complex traits on molecular variation [34, 57, 58]. We find that two CpGs located on chromosome 8 are associated with genetic risk for ASD, and are proximal to a robust GWAS signal for ASD. Furthermore, multiple associated SNPs on chromosome 8 have a polygenic effect on DNA methylation at these two CpG sites, demonstrating how a complex genetic architecture can converge on a common molecular consequence.

This study has several advantages over previous analyses of DNA methylation in ASD. We assessed a relatively large set of samples that is balanced with regard to both disease status and numbers of males and females. This contrasts with previous studies that have been undertaken on much smaller numbers of samples and focused primarily on ASD in males. Our control samples were stringently matched to cases on the basis of a number of criteria (see “Methods”) to minimize the effects of confounding variables that often lead to false positives in molecular epidemiology. Furthermore, our use of neonatal DNA samples—collected before diagnosis and the manifestation of any ASD symptoms—means that we are uniquely positioned to identify epigenetic variation associated with later disease or elevated polygenic burden for later ASD, avoiding the confounding exposures often associated with disease (for example, medication, stress, and reverse causation) [63]. Finally, our study profiled whole blood from neonatal infants rather than cord blood; this minimizes confounding by maternal blood DNA and means our data can be more easily compared to blood datasets derived from later in life. A limitation of our sampling strategy, however, is that no blood cell reference DNA datasets specifically for use on neonatal blood are yet available, likely reflecting the difficulties of obtaining sufficient volumes of neonatal blood for cell sorting and methylomic profiling. Instead, we corrected for blood cell-type composition using algorithms developed using adult datasets which may not fully represent the cellular diversity observed in neonatal blood.

We find little evidence to support an association between DNA methylation at birth and ASD, confirming this finding in a meta-analysis of three studies with a total sample of 2917. Power calculations show that we have > 90% power in our meta-analysis to identify an ASD-associated difference of 0.3% and a difference of 0.7% in the MINERvA cohort alone. While this suggests the lack of association was not due to sample size, we cannot fully conclude that DNA methylation is not associated with the onset of ASD. First, our analyses were constrained by the technical limitations of the Illumina 450K array, which only assays ~ 3% of CpG sites in the genome. Second, this work necessitated the use of a peripheral tissue that may provide limited information about variation in the presumed tissue of interest, i.e., the brain [64]. Although this is a salient point for understanding the role DNA methylation plays in the disease process, biomarkers—by definition—need to be measured in an accessible tissue and therefore justify the use of blood from neonates in this study. Third, given the chronology of sample collection prior to ASD diagnosis, it is plausible that we were looking too early on in the disease process. Another limitation of our study is the possibility of diagnostic misclassification; however, validation of select diagnoses (e.g., schizophrenia, single-episode depression, dementia, and childhood autism) has been previously performed with good results [39, 65].

In contrast, we find that polygenic burden for ASD is robustly associated with DNA methylation at two CpG sites on chromosome 8, with 49 DMPs associated with ASD polygenic burden at a more relaxed “discovery” P value threshold. Of note, both sites flank a significant genetic association signal identified in the latest ASD GWAS and our data suggest that the PRS-associated variation at these sites results from the combined effects of multiple genetic variants associated with ASD in this region. Finally, we have used mQTL analyses to annotate this extended genomic region nominated by GWAS analyses of ASD, using co-localization analyses to highlight potential regulatory variation causally involved in disease. Of interest, we found evidence that several SNPs on chromosome 20 were associated with both ASD and DNA methylation and the genes annotated to these sites (KIZ, XRN2, and NKX2–4) represent putative candidates for a potential functional role in ASD. The mechanisms linking DNA sequence variation to alterations in DNA methylation and other epigenetic modifications are not yet well understood; further exploration of these processes is warranted to provide insight into the functional consequences of disease-associated genetic variation.


Our data provide evidence for differences in DNA methylation at birth associated with an elevated polygenic burden for ASD. Our study represents the first analysis of epigenetic variation at birth associated with autism and highlights the utility of polygenic risk scores for identifying molecular pathways associated with etiological variation.


  1. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4th ed. Washington, DC: The American Psychiatric Association; 2000.

  2. Baron-Cohen S, Scott FJ, Allison C, Williams J, Bolton P, Matthews FE, Brayne C. Prevalence of autism-spectrum conditions: UK school-based population study. Br J Psychiatry. 2009;194:500–9.

    Article  PubMed  Google Scholar 

  3. Investigators ADDMNSYP, CfDCa P. Prevalence of autism spectrum disorders--Autism and Developmental Disabilities Monitoring Network, 14 sites, United States, 2008. MMWR Surveill Summ. 2012;61:1–19.

    Google Scholar 

  4. Christensen DL, Baio J, Van Naarden BK, Bilder D, Charles J, Constantino JN, Daniels J, Durkin MS, Fitzgerald RT, Kurzius-Spencer M, et al. Prevalence and characteristics of autism spectrum disorder among children aged 8 years--Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012. MMWR Surveill Summ. 2016;65:1–23.

    Article  PubMed  Google Scholar 

  5. Robinson EB, St Pourcain B, Anttila V, Kosmicki JA, Bulik-Sullivan B, Grove J, Maller J, Samocha KE, Sanders SJ, Ripke S, et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat Genet. 2016;48:552–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Consortium C-DGPG. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 2013;381:1371–9.

    Article  Google Scholar 

  7. Hallmayer J, Cleveland S, Torres A, Phillips J, Cohen B, Torigoe T, Miller J, Fedele A, Collins J, Smith K, et al. Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry. 2011;68:1095–102.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Iossifov I, O'Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, Stessman HA, Witherspoon KT, Vives L, Patterson KE, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515:216–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Sanders SJ, Ercan-Sencicek AG, Hus V, Luo R, Murtha MT, Moreno-De-Luca D, Chu SH, Moreau MP, Gupta AR, Thomson SA, et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron. 2011;70:863–85.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, Willsey AJ, Ercan-Sencicek AG, DiLullo NM, Parikshak NN, Stein JL, et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012;485:237–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, et al. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Schizophrenia Working Group of the PGC, Ripke S, Neale B, Corvin A, Walters J, Farh K, Holmans P, Lee P, Bulik-Sullivan B, Collier D, et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421.

    Article  Google Scholar 

  13. Grove J, Ripke S, Als TD, Mattheisen M, Walters R, Won H, Pallesen J, Agerbo E, Andreassen OA, Anney R, et al. Common risk variants identified in autism spectrum disorder. bioRxiv. 2017.

  14. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22:1748–59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Siu MT, Weksberg R. Epigenetics of autism spectrum disorder. Adv Exp Med Biol. 2017;978:63–90.

    Article  PubMed  Google Scholar 

  16. Vogel Ciernia A, LaSalle J. The landscape of DNA methylation amid a perfect storm of autism aetiologies. Nat Rev Neurosci. 2016;17:411–23.

    Article  PubMed  Google Scholar 

  17. Wong CC, Meaburn EL, Ronald A, Price TS, Jeffries AR, Schalkwyk LC, Plomin R, Mill J. Methylomic analysis of monozygotic twins discordant for autism spectrum disorder and related behavioural traits. Mol Psychiatry. 2014;19:495–503.

    Article  CAS  PubMed  Google Scholar 

  18. Ladd-Acosta C, Hansen KD, Briem E, Fallin MD, Kaufmann WE, Feinberg AP. Common DNA methylation alterations in multiple brain regions in autism. Mol Psychiatry. 2014;19:862–71.

    Article  CAS  PubMed  Google Scholar 

  19. Nardone S, Sams DS, Reuveni E, Getselter D, Oron O, Karpuj M, Elliott E. DNA methylation analysis of the autistic brain reveals multiple dysregulated biological pathways. Transl Psychiatry. 2014;4:e433.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Nguyen A, Rauch TA, Pfeifer GP, Hu VW. Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain. FASEB J. 2010;24:3036–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Homs A, Codina-Solà M, Rodríguez-Santiago B, Villanueva CM, Monk D, Cuscó I, Pérez-Jurado LA. Genetic and epigenetic methylation defects and implication of the ERMN gene in autism spectrum disorders. Transl Psychiatry. 2016;6:e855.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sun W, Poschmann J, Cruz-Herrera Del Rosario R, Parikshak NN, Hajan HS, Kumar V, Ramasamy R, Belgard TG, Elanggovan B, Wong CC, et al. Histone acetylome-wide association study of autism spectrum disorder. Cell. 2016;167:1385–97.

    Article  CAS  PubMed  Google Scholar 

  23. Elliott HR, Tillin T, McArdle WL, Ho K, Duggirala A, Frayling TM, Davey Smith G, Hughes AD, Chaturvedi N, Relton CL. Differences in smoking associated DNA methylation patterns in South Asians and Europeans. Clin Epigenetics. 2014;6:4.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Joubert BR, Felix JF, Yousefi P, Bakulski KM, Just AC, Breton C, Reese SE, Markunas CA, Richmond RC, Xu CJ, et al. DNA Methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am J Hum Genet. 2016;98:680–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, Belvisi MG, Brown R, Vineis P, Flanagan JM. Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking. Hum Mol Genet. 2013;22:843–51.

    Article  CAS  PubMed  Google Scholar 

  26. Non AL, Binder AM, Kubzansky LD, Michels KB. Genome-wide DNA methylation in neonates exposed to maternal depression, anxiety, or SSRI medication during pregnancy. Epigenetics. 2014;9:964–72.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Gurnot C, Martin-Subero I, Mah SM, Weikum W, Goodman SJ, Brain U, Werker JF, Kobor MS, Esteller M, Oberlander TF, Hensch TK. Prenatal antidepressant exposure associated with CYP2E1 DNA methylation change in neonates. Epigenetics. 2015;10:361–72.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Panni T, Mehta AJ, Schwartz JD, Baccarelli AA, Just AC, Wolf K, Wahl S, Cyrys J, Kunze S, Strauch K, et al. A genome-wide analysis of DNA methylation and fine particulate matter air pollution in three study populations: KORA F3, KORA F4, and the normative aging study. Environ Health Perspect. 2016;124(7):983–90.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mill J, Heijmans BT. From promises to practical strategies in epigenetic epidemiology. Nat Rev Genet. 2013;14:585–94.

    Article  CAS  PubMed  Google Scholar 

  30. Hannon E, Spiers H, Viana J, Pidsley R, Burrage J, Murphy TM, Troakes C, Turecki G, O'Donovan MC, Schalkwyk LC, et al. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci. 2016;19(1):48–54.

    Article  CAS  PubMed  Google Scholar 

  31. Gaunt TR, Shihab HA, Hemani G, Min JL, Woodward G, Lyttleton O, Zheng J, Duggirala A, McArdle WL, Ho K, et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 2016;17:61.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Heyn H, Moran S, Hernando-Herraez I, Sayols S, Gomez A, Sandoval J, Monk D, Hata K, Marques-Bonet T, Wang L, Esteller M. DNA methylation contributes to natural human variation. Genome Res. 2013;23:1363–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Smith AK, Kilaru V, Kocak M, Almli LM, Mercer KB, Ressler KJ, Tylavsky FA, Conneely KN. Methylation quantitative trait loci (meQTLs) are consistently detected across ancestry, developmental stage, and tissue type. BMC Genomics. 2014;15:145.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Hannon E, Dempster E, Viana J, Burrage J, Smith AR, Macdonald R, St Clair D, Mustard C, Breen G, Therman S, et al. An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome Biol. 2016;17:176.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Knight AK, Craig JM, Theda C, Bækvad-Hansen M, Bybjerg-Grauholm J, Hansen CS, Hollegaard MV, Hougaard DM, Mortensen PB, Weinsheimer SM, et al. An epigenetic clock for gestational age at birth based on blood methylation data. Genome Biol. 2016;17:206.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:R115.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Nørgaard-Pedersen B, Hougaard DM. Storage policies and use of the Danish Newborn Screening Biobank. J Inherit Metab Dis. 2007;30:530–6.

    Article  PubMed  Google Scholar 

  38. Pedersen CB, Bybjerg-Grauholm J, Pedersen MG, Grove J, Agerbo E, Bækvad-Hansen M, Poulsen JB, Hansen CS, McGrath JJ, Als TD, et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol Psychiatry. 2018;23(1):6–14.

    Article  CAS  PubMed  Google Scholar 

  39. Mors O, Perto GP, Mortensen PB. The Danish Psychiatric Central Research Register. Scand J Public Health. 2011;39:54–7.

    Article  PubMed  Google Scholar 

  40. Lynge E, Sandegaard JL, Rebolj M. The Danish National Patient Register. Scand J Public Health. 2011;39:30–3.

    Article  PubMed  Google Scholar 

  41. Hollegaard MV, Grauholm J, Nørgaard-Pedersen B, Hougaard DM. DNA methylome profiling using neonatal dried blood spot samples: a proof-of-principle study. Mol Genet Metab. 2013;108:225–31.

    Article  CAS  PubMed  Google Scholar 

  42. Davis S, Du P, Bilke S, Triche J, Bootwalla M. methylumi: Handle Illumina methylation data. R package version 2.14.0.; 2015.

  43. Pidsley R, Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012;9:179–81.

    Article  CAS  Google Scholar 

  45. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, Consortium GP. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65.

    Article  PubMed  Google Scholar 

  46. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P, Consortium IS. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52.

    CAS  PubMed  Google Scholar 

  48. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

  49. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Koestler DC, Christensen B, Karagas MR, Marsit CJ, Langevin SM, Kelsey KT, Wiencke JK, Houseman EA. Blood-based profiles of DNA methylation predict the underlying distribution of cell types: a validation analysis. Epigenetics. 2013;8:816–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Schendel DE, Diguiseppi C, Croen LA, Fallin MD, Reed PL, Schieve LA, Wiggins LD, Daniels J, Grether J, Levy SE, et al. The Study to Explore Early Development (SEED): a multisite epidemiologic study of autism by the Centers for Autism and Developmental Disabilities Research and Epidemiology (CADDRE) network. J Autism Dev Disord. 2012;42:2121–40.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Fischbach GD, Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010;68:192–5.

    Article  CAS  PubMed  Google Scholar 

  53. Andrews SV, Ellis SE, Bakulski KM, Sheppard B, Croen LA, Hertz-Picciotto I, Newschaffer CJ, Feinberg AP, Arking DE, Ladd-Acosta C, Fallin MD. Cross-tissue integration of genetic and epigenetic data offers insight into autism spectrum disorder. Nat Commun. 2017;8(1):1011.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383.

    Article  PubMed  PubMed Central  Google Scholar 

  56. Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, Mahajan M, Manaa D, Pawitan Y, Reichert J, et al. Most genetic risk for autism resides with common variation. Nat Genet. 2014;46:881–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Viana J, Hannon E, Dempster E, Pidsley R, Macdonald R, Knox O, Spiers H, Troakes C, Al-Saraj S, Turecki G, et al. Schizophrenia-associated methylomic variation: molecular signatures of disease and polygenic risk burden across multiple brain regions. Hum Mol Genet. 2017;26(1):210–25.

    CAS  PubMed  Google Scholar 

  58. Fromer M, Roussos P, Sieberts SK, Johnson JS, Kavanagh DH, Perumal TM, Ruderfer DM, Oh EC, Topol A, Shah HR, et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat Neurosci. 2016;19:1442–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Ernst J, Kellis M. Chromatin-state discovery and genome annotation with ChromHMM. Nat Protoc. 2017;12:2478–92.

    Article  CAS  PubMed  Google Scholar 

  60. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Consortium RE, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–30.

    Article  Google Scholar 

  62. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Heijmans BT, Mill J. The seven plagues of epigenetic epidemiology. Int J Epidemiol. 2012;41:74–8.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Hannon E, Lunnon K, Schalkwyk L, Mill J. Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015;10:1024–32.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Lauritsen MB, Jørgensen M, Madsen KM, Lemcke S, Toft S, Grove J, Schendel DE, Thorsen P. Validity of childhood autism in the Danish Psychiatric Central Register: findings from a cohort sample born 1990-1999. J Autism Dev Disord. 2010;40:139–48.

    Article  PubMed  Google Scholar 

Download references


The iPSYCH-Broad ASD Group contains the following participants:

Esben Agerbo

Thomas D. Als

Rich Belliveau

Jonas Bybjerg-Grauholm

Marie Bækved-Hansen

Anders Børglum

Felecia Cerrato

Jane Christensen

Kimberly Chambert

Claire Churchhouse

Mark Daly

Ditte Demontis

Ashley Dumont

Jacqueline Goldstein

Jakob Grove

Christine Hansen

Mads Hauberg

David Hougaard

Daniel Howrigan

Hailiang Huang

Julian Maller

Alicia Martin

Joanna Martin

Manuel Mattheisen

Jennifer Moran

Ole Mors

Preben Mortensen

Benjamin Neale

Merete Nordentoft

Mette Nyegaard

Jonatan Pallsen

Duncan Palmer

Carsten Pedersen

Marianne Pedersen

Timothy Poterba

Jesper Poulsen

Per Qvist

Stephan Ripke

Elise Robinson

Kyle Satterstrom

Christine Stevens

Patrick Turley

Raymond Walters

Thomas Werge

(see Additional file 4 for full listing of e-mail addresses and affiliations).


This study was supported by grant HD073978 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institute of Environmental Health Sciences, and National Institute of Neurological Disorders and Stroke; and by the Beatrice and Samuel A. Seaver Foundation. We acknowledge iPSYCH and The Lundbeck Foundation for providing samples and funding. The iPSYCH (The Lundbeck Foundation Initiative for Integrative Psychiatric Research) team acknowledges funding from The Lundbeck Foundation (grant numbers R102-A9118 and R155–2014-1724), the Stanley Medical Research Institute, the European Research Council (project number 294838), the Novo Nordisk Foundation for supporting the Danish National Biobank resource, and grants from Aarhus and Copenhagen Universities and University Hospitals, including support to the iSEQ Center, the GenomeDK HPC facility, and the CIRRAU Center. This research has been conducted using the Danish National Biobank resource, supported by the Novo Nordisk Foundation. JM is supported by funding from the UK Medical Research Council (MR/K013807/1) and a Distinguished Investigator Award from the Brain & Behavior Research Foundation. The SEED study was supported by Centers for Disease Control and Prevention (CDC) Cooperative Agreements announced under the RFAs 01086, 02199, DD11–002, DD06–003, DD04–001, and DD09–002 and the SEED DNA methylation measurements were supported by Autism Speaks Award #7659 to MDF. SA was supported by the Burroughs-Wellcome Trust training grant: Maryland, Genetics, Epidemiology and Medicine (MD-GEM). The SSC was supported by Simons Foundation (SFARI) award and NIH grant MH089606, both awarded to STW.

Availability of data and materials

Given the nature of the MINERvA cohort, access to data can only be provided through secured systems which comply with the current Danish and EU data standards. To comply with the study’s ethical approval, access to the raw data is only available to qualified researchers upon request. All summary statistics and analysis scripts are available directly from the authors (please contact Jonas Grauholm at R scripts used to perform the analyses reported in this manuscript are available on GitHub ( and have been archived in Zenado at

Author information

Authors and Affiliations

Author notes

  1. Mads Vilhelm Hollegaard is deceased. This paper is dedicated to his memory.

    • Mads Vilhelm Hollegaard



AR, DS, and JM designed and coordinated the study. GB-G, DMH, MVH, M-BH, and CSH led generation of DNA methylation data from dried neonatal bloodspots. AR, DS, JM, EH, JB-G, CL-A, and MDF oversaw implementation of the data analyses. EH led data analysis. CL-A and SVA analyzed data from replication datasets. JG provided autism polygenic risk scores. DMH, OM, PBM, ADB, TW, and MN are principal investigators of the iPSYCH study and obtained funding for genetic data. EH and JM drafted the manuscript, with input from AR, DS, CL-A, JG, SVA, MDF, MB, MH, JB, and JB-G. All coauthors read and approved the final manuscript.

Corresponding author

Correspondence to Jonathan Mill.

Ethics declarations

Ethics approval and consent to participate

The MINERvA study has been approved by the Regional Scientific Ethics Committee in Denmark, the Danish Data Protection Agency and the NBS-Biobank Steering Committee. iPSYCH is a register-based cohort study solely using data from national health registries. The study was approved by the Scientific Ethics Committees of the Central Denmark Region (; 1–10–72-287-12) and executed according to guidelines from the Danish Data Protection Agency (; 2012–41-0110). Passive consent was obtained, in accordance with Danish Law nr. 593 of June 14, 2011, para 10, on the scientific ethics administration of projects within health research. Permission to use the dried blood spot samples stored in the Danish Neonatal Screening Biobank (DNSB) was granted by the steering committee of DNSB (SEP 2012/BNP). Research was conducted in accordance with the principles of the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

TW has acted as advisor and lecturer to H. Lundbeck A/S. The remaining authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Supplementary Figures S1–S23. (PDF 8163 kb)

Additional file 2:

Supplementary Tables S1–S7 and S9–S10. (PDF 339 kb)

Additional file 3:

Supplementary Table S8 and Supplementary Table S11. (XLSX 100 kb)

Additional file 4:

iPSYCH-Broad ASD Group participants’ affiliations. (XLSX 14 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hannon, E., Schendel, D., Ladd-Acosta, C. et al. Elevated polygenic burden for autism is associated with differential DNA methylation at birth. Genome Med 10, 19 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: