An epigenome-wide association study of sex-specific chronological ageing

Background Advanced age is associated with cognitive and physical decline and is a major risk factor for a multitude of disorders. There is also a gap in life expectancy between males and females. DNA methylation differences have been shown to be associated with both age and sex. Here, we investigate age-by-sex differences in blood-based DNA methylation in an unrelated cohort of 2586 individuals between the ages of 18 and 87 years, with replication in a further 4450 individuals between the ages of 18 and 93 years. Methods Linear regression models were applied, with stringent genome-wide significance thresholds (p < 3.6 × 10−8) used in both the discovery and replication data. A second, highly conservative mixed linear model method that better controls the false-positive rate was also applied, using the same genome-wide significance thresholds. Results Using the linear regression method, 52 autosomal and 597 X-linked CpG sites, mapping to 251 unique genes, replicated with concordant effect size directions in the age-by-sex interaction analysis. The site with the greatest difference mapped to GAGE10, an X-linked gene. Here, DNA methylation levels remained stable across the male adult age range (DNA methylation by age r = 0.02) but decreased across female adult age range (DNA methylation by age r = − 0.61). One site (cg23722529) with a significant age-by-sex interaction also had a quantitative trait locus (rs17321482) that is a genome-wide significant variant for prostate cancer. The mixed linear model method identified 11 CpG sites associated with the age-by-sex interaction. Conclusion The majority of differences in age-associated DNA methylation trajectories between sexes are present on the X chromosome. Several of these differences occur within genes that have been implicated in sexually dimorphic traits.


Background
Advanced age is associated with cognitive and physical decline and is a major risk factor for a multitude of disorders including cancer, cardiovascular disease, and neurodegenerative diseases. Furthermore, males and females, on average, exhibit disparate risk profiles for various disease states as well as different life expectancies [1,2]. Indeed, the average life expectancy at birth in Scotland is 77.0 years for males and 81.1 years for females as reported by the Life Tables for Scotland 2015-2017 [3]. Thus, markers of ageing are merited, including those which can exploit sexual dimorphism for sex-specific prediction of ageing and disease risk. Biological hallmarks of ageing have been observed at the cellular and molecular level and include shortening of telomeres, genomic instability, and both global and local changes in DNA methylation (DNAm) levels [4][5][6]. DNAm is a common epigenetic mark, typically occurring in the context of a cytosine-guanine dinucleotide motif (CpG). It can be modulated by both environmental exposures and genetic variation and is sexually dimorphic [7].
Elastic net regression and best linear unbiased predictor models have been used to robustly predict chronological age by leveraging inter-individual variation in methylation profiles [8][9][10]. These methodologies capture global-level DNAm changes with respect to ageing but fail to inform the contribution of individual loci to the ageing process. These biological age predictors, also referred to as 'epigenetic clocks', correlate strongly with chronological age. Additionally, for a given chronological age, an advanced epigenetic age is associated with increased mortality risk and many age-related morbidities [11]. Importantly, males exhibit increased DNAm-based age acceleration relative to females (i.e. a 'faster ticking' epigenetic clock) supporting the role of epigenetic perturbations in sex-specific ageing trajectories [8,9].
In 2013, Horvath proposed a pan-tissue epigenetic clock derived from the linear combination of 353 CpGs whereas Hannum et al. created a DNAm-based clock based on 71 CpGs in blood tissue [8,9]. Following on from these seminal studies, Zhang et al. developed a highly precise DNAm-based predictor of chronological age (ZhangAge) limiting the value of such estimators as biomarkers of ageing [10]. Subsequently, a new generation of DNAm-based measures of ageing was proposed. Recently, Levine et al. proposed a powerful predictor of lifespan and health by developing a methylation-based predictor of an individual's 'phenotypic age' (DNAm PhenoAge) [12]. Phenotypic age is informed by chronological age and physical and biochemical measures, such as albumin and mean cell volume. Finally, a novel clock, termed DNAm GrimAge, was trained using mortality as a reference and supplants predecessor clocks in predicting the risk of mortality and a number of age-related morbidities [13].
While global information relating to DNAm perturbations has been harnessed to measure biological ageing, little is known regarding the role of specific epigenetic loci in the ageing process. Presently, age-related hypermethylation at the ELOVL2 locus on chromosome 6 remains the strongest known site-specific DNAm alteration throughout the lifespan [14]. Methylation status at FHL2, KFL14, C1orf132, and TRIM59 has also been shown to exhibit linear associations with chronological age [15]. These data may highlight biologically important epigenetic substrates of the human ageing process. However, the loci which are differentially methylated between males and females, in the context of ageing, merit elucidation. Therefore, in the current study, we sought to identify sex differences in age-associated genome-wide DNAm changes, in the whole blood from a discovery sample of 2586 unrelated individuals. These were replicated in a further 4450 unrelated individuals. Both discovery and replication sets are derived from the same parent cohort: Generation Scotland [16]. The replication cohort was unrelated to the discovery cohort. Further understanding of the sex-specific effects on biological ageing, through the identification of differentially methylated loci, may assist in identifying novel risk factors for age-and sex-associated pathologies.

Generation Scotland: Scottish Family Health Study
Data came from the family-based Generation Scotland: Scottish Family Health Study (GS). GS participants were recruited from GP practices in 5 regions across Scotland between the years 2006 and 2011 [16]. The probands were aged between 35 and 65 years and were asked to invite first-degree relatives to join the study, which had a final size of 24,090. A variety of cognitive, physical, and health data were collected at the study baseline along with the blood or saliva samples for DNA genotyping. Blood-based DNAm data were obtained on a subset of 5200 participants using the Illumina EPIC array [17]. Quality control details have been reported previously [17]. Briefly, probes were removed based on (i) outliers from visual inspection of the log median intensity of the methylated versus unmethylated signal per array, (ii) a bead count < 3 in more than 5% of samples, and (iii) ≥ 5% of samples having a detection p value > 0.05. Samples were removed (i) if there was a mismatch between their predicted sex and recorded sex and/or (ii) if ≥ 1% of CpGs had a detection p value > 0.05. For the present analyses, we considered unrelated individuals from the DNAm subset of GS. A genetic relationship matrix was built using GCTA-GRM, and a relatedness coefficient of < 0.025 was specified to exclude related individuals [18]. In cases where a couple was present, 1 individual was removed to minimise shared environment effects. This left an analysis sample of 2586 unrelated individuals ranging in age from 18 to 87 years and 807,857 probes.
The second set of blood-based DNA methylation from Generation Scotland was released in early 2019 and was treated as a replication sample. This comprised 4450 individuals who were unrelated (genetic relatedness < 0.05) to each other and to the 5200 participants from the first Generation Scotland methylation data set. Quality control steps have been reported previously [19] and were near identical to those reported above.

Statistical analysis
All analyses were performed in R version 3.5.3 [20].

Epigenome-wide association studies
Epigenome-wide association studies (EWASs) of chronological age, sex, and the interaction between age and sex were performed using two approaches.
First, we considered linear regression models adjusted for smoking status (smoking pack-years and statuscurrent, gave up in the last year, gave up more than a year ago, never, or unknown), estimated white blood cell proportions (CD8+ T cells, CD4+ T cells, natural killer cells, B cells, and granulocytes), methylation batch, and 20 methylation-based principal components to correct for unmeasured confounders. Age was centred by its mean, and sex was included as a factor. The models were run using the limma package in R (empirical Bayes moderated t-statistics) [21].
To remove the widespread effect of sex on X-linked methylation, we also ran sex-stratified age EWASs on the X chromosome. We compared this output with the results from the age-by-sex interaction model by plotting the -log 10 p values from the interaction model against the -log 10 p value from a heterogeneity test of the effects between the sex-stratified model: χ 2 hetero = (betamale − beta female ) 2 /(SE male 2 + SE female 2 ). Second, we considered the MOMENT method from the OmicS-data-based Complex trait Analysis software (OSCA) [22]. MOMENT is a mixed linear model-based method that can account for unobserved confounders and the correlation between distal probes which may be introduced by such confounders. Initially, a linear regression step is performed to obtain an association p value for each CpG site. The sites with p < 0.05/n probes are included in the first component with the remainder feeding into a second component. These components are then fitted as random effects in the model to control for confounding. However, convergence issues can occur when a large number of CpGs are included in the first component, as is the case with EWASs of age and sex. To account for this, we implemented a stepwise selection procedure to avoid saturation of the first component (MOMENT2; http://cnsgenomics.com/software/ osca/#EWAS). In the age EWAS models, we preadjusted the methylation data for sex and batch. Additional adjustment for cell counts and smoking resulted in genomic deflation due to collinearity between these covariates and age. In the sex EWAS models, we pre-adjusted the methylation data for age and batch. For the age-by-sex interaction EWAS models, the methylation data were pre-adjusted for age, sex, smoking status, smoking pack-year, cell types, and batch, and the ageby-sex interaction outcome was adjusted for age and sex. A default threshold (p < 0.05/n probes ) was used to select probes for the first component, prior to the stepwise analysis, for all models. The only exception was the sex EWAS in the replication cohort where this threshold was increased to p < 1 × 10 −20 to enable model convergence. Probes with a p value less than 3.6 × 10 −8 [23] were considered epigenome-wide significant associations.

Pathway analysis
Enrichment was assessed among the KEGG pathways and Gene Ontology (GO) terms using the gometh() function in the missMethyl package in R [24]. This function models the relationship between the number of probes per gene and the probability of being selected, accounting for the selection bias associated with probedense genes.

DNAm and chronological age
Using a linear regression (LR) model approach that did not adjust for inter-probe correlations, there were 250,485 autosomal and 3096 X-linked CpGs associated with chronological age at the epigenome-wide significant level (p < 3.6 × 10 −8 ) in the discovery cohort. 151,537 autosomal and 1668 X-linked sites were epigenome-wide significant with the effect sizes in the same direction in the replication cohort ( Fig. 1; Additional file 1: Tables S1-S2); 68,038 of these CpGs showed increasing methylation with age. High genomic inflation was observed with lambda values ranging from 12.2 to 57.1 (Additional file 2: Figure. S1).
A mixed linear model (MLM) approach that adjusted for inter-probe correlations and unmeasured confounders showed much better control of genomic inflation (lambda values ranging from 0.96 to 1.03; Additional file 2: Figure. S2). It identified 6 autosomal and 5 X-linked sites that were epigenome-wide significant with the effect sizes in the same direction in both the discovery and replication cohorts ( Table 1). All of these sites were included in the 153,205 sites reported in Fig. 1 and Additional file 1: Tables S1-S2.

DNAm and sex differences on the autosomes
There were 134,649 autosomal CpGs associated with sex at the epigenome-wide significant level (p < 3.6 × 10 −8 ) in the discovery cohort; 69,384 were epigenome-wide significant with the effect sizes in the same direction in the replication cohort ( Fig. 1; Additional file 1: Table S3)  ). It identified 4 autosomal sites that were epigenomewide significant with the effect sizes in the same direction in both the discovery and replication cohorts ( Table 1). All of these sites were included in the 69,384 sites reported in Fig. 1 and Additional file 1: Table S3.

DNAm and the interaction between chronological age and sex
The LR model that did not adjust for inter-probe correlations identified 85 autosomal and 635 X-linked CpGs that showed different ageing trajectories by sex (Additional file 1: Tables S4-S5; p < 3.6 × 10 −8 ) in the discovery cohort. Fiftytwo autosomal and 597 X-linked sites replicated (same direction and p < 3.6 × 10 −8 ) in the replication cohort (Fig. 1). These mapped to 251 unique genes. Genomic inflation ranged from lambdas of 1.11 and 6.29 (Additional file 2: Figure. S5). A heterogeneity test between age EWAS outputs from sex-stratified models gave -log 10 p values that were highly correlated with the -log 10 p values from the interaction model (Additional file 2: Figure. S6).
The more conservative MLM approach (lambda range 0.94-1.00; Additional file 2: Figure. S7) identified four autosomal and seven X-linked sites that were epigenomewide significant with the effect sizes in the same direction in both the discovery and replication cohorts (Additional file 1: Table S6). Nine of these sites were included in the 649 sites reported in Fig. 1 and Tables S4-S5.
The probe with the greatest ageing-associated difference between males and females was cg15833111, mapping to GAGE10 on the X chromosome ( Fig. 2; interaction p < 1.41 × 10 −23 in all LR and MLM models). In the discovery cohort, hypomethylation of this probe was observed with increasing age in females (r = − 0.61), whereas methylation levels remained stable in males (r = 0.02). Four of the 9 CpG sites identified as significant in Fig. 1 Age, sex, and age-by-sex effects in discovery and replication cohorts (linear regression method). Discovery cohort effect sizes (x-axis) are plotted against replication cohort effect sizes (y-axis) for the epigenome-wide analysis of age, sex, and age-by-sex. Autosomal and X-chromosome associations are presented separately both cohorts and by both modelling approaches exhibited clear differences in mean DNAm levels between males and females across an adult age range (cg00531806, cg08814148, cg20202246, cg24541420). These differences ranged from 0.14 to 0.21 on the beta value scale. There were significant main effects for both age and sex for 239 of the 649 probes (13 autosomal and 226 X-linked).
To investigate whether X chromosome inactivation (XCI) affected the 217 genes mapping to the 597 X-linked CpGs identified in this analysis, the genes were queried against a list of 114 XCI-escaping genes [25]. Thirty-two age-by-sex-associated genes were present in the list of 114 XCI escapees. There was an enrichment of XCI escapee genes in the list of age-by-sex-associated genes (χ 2 = 4.19; p = 0.04) with 14.7% of the 217 age-bysex associated genes being XCI escapees versus 9.6% in the remaining 850 genes (based on the EPIC array annotation) on the X chromosome. Genes mapping to the 649 probes identified and replicated in the LR analysis were queried against published GWAS results (GWAS catalog [26]) to determine whether there was an enrichment among specific traits. Considering SNPs located within 1 Mb of a given age-by-sex-associated CpG site yielded 98 genome-wide significant associations (p < 5 × 10 −8 ) comprising 73 SNPs (Additional file 1: Table S7) [27]. Several associations were with sexually dimorphic traits (e.g. prostate cancer, systemic lupus erythematosus, male pattern baldness, body mass index) [28][29][30][31][32][33][34][35][36][37]. All of the SNPs were located on the X chromosome. Of the 649 probes significantly associated with the age-sex interaction, 399 were also present on the Illumina 450K array (all of which were X-linked). These were queried against the ARIES methylation Quantitative Trait Locus (mQTL) database (mQTLdb [38];). There were 9664 CpG-SNP associations reported comprising 4582 unique SNPs and 279 unique CpGs (Additional file 1: Fig. 2 Pseudo-trajectories of DNA methylation by sex for probes associated with the age-by-sex interaction term using linear regression and MOMENT methods. DNA methylation beta values (y-axis) are plotted against chronological age (x-axis) separately for male samples (purple points) and female samples (orange points). Red lines represent the regression slope for the univariate model DNAm~age. Grey contour lines indicate data density Table S8). Two mQTL SNPs were present in the 73 SNPs identified from the GWAS catalogue query. These were associated with male pattern baldness and prostate cancer [31,36]. The prostate cancer-associated SNP (rs17321482) is a QTL in an adolescent sample for the genome-wide significant age-by-sex CpG (cg23722529), mapping to ARHGAP6. Figure 3 shows a plot of methylation at cg23722529 by age, stratified by sex and genotype for rs17321482 in the discovery cohort. A simple linear regression model of CpG on the interaction between genotype and age suggested no association between DNA methylation and age in either sex (male interaction P age_T = 0.40, female interaction P age_CT = 0.86, P age_TT = 0.82). Whereas ascertainment bias (e.g. sampling healthier older males or those that survive prostate cancer) may have been driving the association between cg23722529 and age in men, there was a minimal difference in the effect sizes in the whole population versus those aged under 60 years (β all = − 0.0047, p = 1.2 × 10 −4 ; β < 60 = − 0.0043, p = 5.5 × 10 −3 ). The male pattern baldness SNP (rs79798752) maps to EDA2R, within~554 kb from the EWAS CpG (cg15343840-an EPIC array-specific CpG) in the same gene. This SNP is a QTL for a different probe (cg08021299), approximately 249 kb away and mapping to HEPH. The probe is also among the 649 CpGs significantly associated with the ageby-sex interaction.

Functional enrichment analysis
The 251 genes where differential methylation was associated with the age-by-sex interaction in the LR analysis were assessed for enrichment in KEGG pathways, GO terms, and tissue-specific expression in 30 general tissues from GTEx v7 [24,39].
There were no KEGG pathways or GO terms enriched for the genes associated with the age-by-sex interaction after correction for multiple testing (Bonferroni-adjusted p < 0.05; Additional file 1: Tables S9-S10). The strongest enrichment was observed among downregulated genes in the testis and upregulated genes in the brain regions including thy hypothalamus and hippocampus (Fig. 4).

Discussion
Using a combination of standard linear regression methods and a more conservative mixed modelling method, we have identified loci displaying sexually dimorphic ageing associations in DNAm across the adult age range. These associations were predominantly X-linked. Of the nine CpGs identified by both methods, two mapped to genes (GAGE10 and FIGN).
The site with the greatest absolute age-DNAm correlation difference between males and females mapped to GAGE10, a member of the GAGE cancer/testis antigen family [40]. Normal expression of GAGE proteins is limited to germ cells. However, GAGE transcripts have been observed in multiple cancers including melanomas and breast, lung, ovarian, and thyroid cancers [40][41][42][43][44][45][46].
The majority of age-by-sex-associated genes did not return results when queried in the GWAS catalogue. This is not surprising as the X chromosome is usually omitted from association studies [47]. However, of the associations identified, several pertained to sexually Fig. 4 Enrichment of age-by-sex-associated genes among differentially expressed genes by tissue. The figure shows enrichment among differentially expressed genes in a given tissue (x-axis) compared to all other tissue types. The enrichment -log10(p value) is shown on the y-axis. Tissues with significant enrichment of genes are highlighted in red dimorphic traits. A variant in ARGHAP6, an age-by-sexassociated gene from this study, has previously been linked to prostate cancer [31]. However, there was no effect of this genotype on the age-by-sex interaction. Further examination of interactions between DNA methylation and disease-related genetic risk factors in a longitudinal context is warranted.
The mQTLdb resource is limited to probes present on the Infinium Human Methylation 450K BeadChip-the predecessor of the Infinium MethylationEPIC BeadChip used in the current study [38]. Of the 649 sites queried for mQTLs, 399 were present on both platforms. It is, therefore, possible that additional, as-yet-unidentified SNP associations are present among the remaining sites specific to the Infinium MethylationEPIC BeadChip.
Analysis of tissue-specific expression patterns of the genes displaying sexually dimorphic age-associated DNAm trajectories revealed strong enrichment among differentially expressed genes in the testis and brain regions, including the hypothalamus. Enrichment among differentially expressed genes in the testis may be indicative of a relationship with endocrine function and sex-specific ageing. This is consistent with the current hypotheses of endocrine differences as a contributor to the disparity in the life expectancy of males and females [48][49][50]. Additionally, the hypothalamus displays anatomical, ontogenetic, and biochemical differences between males and females [51]. The present findings may further delineate the established contribution of differential DNAm profiles to sex-specific disparities in the brain structure and function [52,53].
In addition to the age-by-sex interaction, we examined the relationship between DNAm and both chronological age and sex. We replicated previous findings, with the strongest age-associated effects observed in KLF14, ELOVL2, and FHL2 [14,15] and sex-associated effects observed in RFTN1 [54]. As replication was observed using both the liberal LR method and the more conservative MLM approach, these genes should be prioritised for further functional investigation in studies of ageing and sexually dimorphic traits.
The current study is strengthened by the use of large, unrelated discovery and replication cohorts with a broad age range, which has permitted the development of pseudolongitudinal profiles of DNAm across the life course in males and females. An additional strength is the use of two modelling approaches. The more liberal linear regression approach does not account for correlations which may be present between probes and is prone to genomic inflation. However, in the current study, we accompanied our analysis with a more conservative mixed linear modelling approach (called MOMENT), which accounts for the relationship between a given probe and trait with distal probes fitted in multiple random effect components [22]. This approach can account for unobserved confounders and reduce inflation. However, this also poses a limitation in that results based on this method may be overly conservative. This is evidenced by the small number of age and sex associations identified using MOMENT.

Conclusions
We identified 649 CpG sites displaying differences in ageassociated DNAm patterns between males and females. The majority of these sites are located on the X chromosome, several of which are within genes associated with sexually dimorphic traits by GWAS, including prostate cancer and male pattern baldness. In order to identify the mechanisms of sex-specific differences in biological ageing, further investigation of sexually dimorphic characteristics of these processes over the life course is warranted.
Additional file 1: Tables S1-S10: Top 1000 autosomal probe associations for chronological age present in both discovery and replication cohorts, using the linear regression model, ranked by discovery P-value. Table headers correspond to Limma TopTable outputs  -Table S1. Top 1000 X-linked probe associations for chronological age present in the discovery and replication cohorts, using the linear regression model, ranked by discovery P-value. Table headers correspond to  Limma TopTable outputs -Table S2. Top 1000 autosomal probe associations for sex present in both discovery and replication cohorts, using the linear regression model, ranked by discovery P-value. Table  headers correspond to Limma TopTable outputs -Table S3. Top 1000 autosomal probe associations for the age-by-sex interaction present in both discovery and replication cohorts, using the linear regression method, ranked by discovery P-value. Table headers correspond to  Limma TopTable outputs -Table S4. Top 1000 X-linked probe associations for the age-by-sex interaction present in both discovery and replication cohorts, using the linear regression approach, ranked by discovery Pvalue. Table headers correspond to Limma TopTable outputs -Table S5. Epigenome-wide significant sites associated with the age-by-sex interaction with concordant effects in the discovery and replication cohorts identified using the conservative mixed-modelling approach - Table S6. GWAS catalogue outputs within 1 Mb of age-by-sex-associated CpGs - Table S7. ARIES mQTLdb output for query containing age-by-sex-associated CpGs - Table S8. KEGG pathway enrichment output for age-by-sexassociated sites - Table S9. GO Term enrichment output for age-by-sexassociated sites - Table S10.
Additional file 2: Figures S1-S7. Quantile-quantile plots for the discovery and replication EWASs of chronological age using the linear regression method - Fig. S1. Quantile-quantile plots for discovery and replication EWASs of chronological age using the conservative mixedmodelling method - Fig. S2. Quantile-quantile plots for the discovery and replication EWASs of sex using the linear regression method - Fig.  S3. Quantile-quantile plots for discovery and replication EWASs of sex using the conservative mixed-modelling method - Fig. S4. Quantilequantile plots for discovery and replication EWASs of the age-by-sex interaction using the linear regression method - Fig. S5. Heterogeneity test P-values for male-only EWAS of age versus female-only EWAS of chronological age in the discovery and replication sets - Fig. S6. Quantile-quantile plots for discovery and replication EWASs of the age-bysex interaction using the conservative mixed-modelling method - Fig. S7.
the whole GS team that includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, healthcare assistants, and nurses.
Authors' contributions DLM, REM, IJD, AFM, QZ, PMV, and JY were responsible for the conception and design of the study. DLM and FZ contributed to the data analysis. DLM and REM drafted the article. DLM, REM, RMW, MLB, SWM, and AC contributed to the data preparation. DJP, AMM, KLE, and ADM were responsible for the data collection. All authors revised the article and read and approved the final manuscript.

Availability of data and materials
According to the terms of consent for GS participants, access to individuallevel data (omics and phenotypes) must be reviewed by the GS Access Committee. Applications should be made to access@generationscotland.org. Full summary statistics for the analyses presented are publicly available online at https://doi.org/10.7488/ds/2709.
Ethics approval and consent to participate All components of Generation Scotland received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). All participants provided broad and enduring written informed consent for biomedical research. Generation Scotland has also been granted Research Tissue Bank status by the East of Scotland Research Ethics Service (REC Reference Number: 15/0040/ES), providing generic ethical approval for a wide range of uses within medical research. This study was performed in accordance with the Helsinki declaration.

Consent for publication
Not applicable.
Competing interests AMM reports grants from The Sackler Trust, grants from Eli Lilly, and grants from Janssen outside the submitted work. The remaining authors declare that they have no competing interests.