Elevated polygenic burden for autism is associated with differential DNA methylation at birth

Background Autism spectrum disorder (ASD) is a severe neurodevelopmental disorder characterized by deficits in social communication and restricted, repetitive behaviors, interests, or activities. The etiology of ASD involves both inherited and environmental risk factors, with epigenetic processes hypothesized as one mechanism by which both genetic and non-genetic variation influence gene regulation and pathogenesis. The aim of this study was to identify DNA methylation biomarkers of ASD detectable at birth. Methods We quantified neonatal methylomic variation in 1263 infants—of whom ~ 50% went on to subsequently develop ASD—using DNA isolated from archived blood spots taken shortly after birth. We used matched genotype data from the same individuals to examine the molecular consequences of ASD-associated genetic risk variants, identifying methylomic variation associated with elevated polygenic burden for ASD. In addition, we performed DNA methylation quantitative trait loci (mQTL) mapping to prioritize target genes from ASD GWAS findings. Results We identified robust epigenetic signatures of gestational age and prenatal tobacco exposure, confirming the utility of DNA methylation data generated from neonatal blood spots. Although we did not identify specific loci showing robust differences in neonatal DNA methylation associated with later ASD, there was a significant association between increased polygenic burden for autism and methylomic variation at specific loci. Each unit of elevated ASD polygenic risk score was associated with a mean increase in DNA methylation of − 0.14% at two CpG sites located proximal to a robust GWAS signal for ASD on chromosome 8. Conclusions This study is the largest analysis of DNA methylation in ASD undertaken and the first to integrate genetic and epigenetic variation at birth. We demonstrate the utility of using a polygenic risk score to identify molecular variation associated with disease, and of using mQTL to refine the functional and regulatory variation associated with ASD risk variants. Electronic supplementary material The online version of this article (10.1186/s13073-018-0527-4) contains supplementary material, which is available to authorized users.

. Age estimates derived from neonatal blood DNA methylation data accurately reflect actual age. Scatterplots of derived age (top row) and gestational age (bottom row) estimated from DNA methylation using two different clock algorithms (Horvath et al, 2013, Knight et al, 2016 against gestational age in weeks (first column), days between birth and sampling (middle column) and gestational age corrected for days between birth and sampling (third column) for individual samples profiled in the Minerva cohort (n = 1,263).

Figure S9. QQ-plot of P-values from the autism case-control EWAS meta-analysis.
The metaanalysis included samples from the MINERvA, SEED and Simon's cohorts (total n = 2,917) (see Materials and Methods).

Figure S10. Forest plots for the four top-ranked autism-associated DMPs from the meta-analysis with consistent directions of effect.
The effect is the mean difference in DNA methylation between autism cases and controls. The sizes of the boxes are proportional to the sample size of that cohort.

Figure S12. Correlation plot of autism polygenic risk scores (PRSs) calculated using different P-value thresholds (p T ) to define the group of SNPs included in each calculation.
The boxes in the bottom triangle of the matrix contain scatterplots of PRS calculated at different significant thresholds with line of best fit. In the diagonal squares are histograms overlaid with a density plot of PRS at each threshold. The boxes in the top triangle contain the absolute correlation statistics with asterisks indicating the significance of the correlation statistic. S1 p T = 5x10 -8 ; S2 p T = 1x10 -6 ; S3 p T = 1x10 -4 ; S4 p T =0.001; S5 p T = 0.01; S6 p T = 0.05; S7 p T = 0.1; S8 p T = 0.2; S9 p T = 0.5; S10 p T = 1. Figure S13. QQ plots of autism polygenic risk score (PRS) epigenome-wide association study (EWAS) analyses. Different panels present results using different P-value thresholds (p T ) to define SNPs included in the PRS calculation. S1 p T = 5x10 -8 ; S2 p T = 1x10 -6 ; S3 p T = 1x10 -4 ; S4 p T =0.001; S5 p T = 0.01; S6 p T = 0.05; S7 p T = 0.1; S8 p T = 0.2; S9 p T = 0.5; S10 p T = 1.

Figure S15. Correlation plot of -log10(P-values) from EWASs of the autism PRS calculated using different P-value thresholds (p T ).
The boxes in the bottom triangle of the matrix contain scatterplots of PRSs calculated at different significant thresholds with line of best fit. In the diagonal squares are histograms overlaid with density plot of PRS at each threshold. The boxes in the top triangle contain the absolute correlation statistics with asterisks indicating the significance of the correlation statistic. S1 p T = 5x10 -8 ; S2 p T = 1x10 -6 ; S3 p T = 1x10 -4 ; S4 p T =0.001; S5 p T = 0.01; S6 p T = 0.05; S7 p T = 0.1; S8 p T = 0.2; S9 p T = 0.5; S10 p T = 1. Figure S16. Scatterplots of autism PRS (calculated using the optimal threshold of pT < 0.1) against DNA methylation for the ten top-ranked DMPs identified in the EWAS of autism PRS. Points are colored to distinguish between autism cases (green) from controls (red).

Figure S17. Manhattan plots of P-values from GWAS and EWAS of autism across a region on chromosome 8. A) Manhattan plot of GWAS of autism status. B)
Manhattan plot of EWAS of ASD status. C) EWAS of autism polygenic risk score. Red horizontal lines indicate genome-wide significance for the GWAS (P < 5x10 -8 ) and experiment-wide significance for the EWAS (P < 1x10 -7 ). Genomic locations are based on hg19. Figure S18. Regional plot of regulatory activity around the chromosome 8 region implicated in ASD by both GWAS and EWAS. Chromatin states as predicted by ChromHMM (15 state model) for blood and brain cell types and tissues centred around A) the index SNP in the chromosome 8 region from GWAS of ASD, and the two DNA methylation sites associated with ASD PRS: B) cg02771117 and C) cg27411982. Figure produced using UCSC Genome Browser and imputed chromHMM tracks from the Roadmpa Epigenomics Project. Genomic locations are based on hg19.
Roadmap Epigenomics Consortium, Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330. 20 Figure S19. Boxplots of DNA methylation at cg02771117 split by genotype at genetic variants in the associated chromosome 8 region included in the ASD PRS. Each boxplot presents an association between a genetic variant (x-axis) and DNA methylation at cg02771117 (y-axis), the p value reported is taken from a linear regression analysis between these variables. 21 Figure S20. Boxplots of DNA methylation at cg27411982 split by genotype at genetic variants in the associated chromosome 8 region included in the ASD PRS. Each boxplot presents an association between a genetic variant (x-axis) and DNA methylation at cg27411982 (y-axis), the p value reported is taken from a linear regression analysis between these variables.

Figure S21. Scatterplots of -log10 P-values from the EWAS of ASD PRS comparing analysis performed in all individuals (x-axis) against analysis performed separately for cases and controls and then combined with a meta-analysis (y-axis).
Figure S22: mQTL mapping can localize putative causal loci associated with disease. Presented here is are two genomic regions (chr20:14836243-14926587 and chr20:21233198-21494184) identified in a recent GWAS analysis of ASD. At the top of the figure is a schematic detailing which genes are located in these regions which are identified by their Entrez ID. All genetic variants identified in the largest GWAS of ASD to date (P < 1x10 -4 ) (Grove et al, submitted) are represented by vertical solid lines where the colour reflects the strength of the association ranging from gray (less significant P-values) to black (more significant Pvalues); a red vertical line indicates the most significant genetic variant in this region. All DNA methylation sites tested for mQTL in the Minerva dataset are indicated by red vertical lines and genetic variants by blue vertical lines. Significant mQTL (P < 1x10 -13 ) are indicated by black diagonal lines between the respective genetic variant and DNA methylation site. Genomic locations are based on hg19.

MACROD2-AS1
24 XRN2 NKX2-2 Figure S23: Evidence for co-localisation of ASD and DNA methylation. Region on chromosome 20 where four CpGs sites had genetic signals consistent with the same causal variant being associated with both ASD and DNA methylation. Manhattan plots of the GWAS signal for A) ASD, B) cg26861732, C) cg04219410, D) cg00239420, and E) cg01798261 in this region (chr20:20906687-21578495). In B-E the red vertical line indicates the location of the CpG site. Genomic locations are based on hg19.