Local adaptation in European populations affected the genetics of psychiatric disorders and behavioral traits

Background Recent studies have used genome-wide data to investigate evolutionary mechanisms related to behavioral phenotypes, identifying widespread signals of positive selection. Here, we conducted a genome-wide investigation to study whether the molecular mechanisms involved in these traits were affected by local adaptation. Methods We performed a polygenic risk score analysis in a sample of 2455 individuals from 23 European populations with respect to variables related to geo-climate diversity, pathogen diversity, and language phonological complexity. The analysis was adjusted for the genetic diversity of European populations to ensure that the differences detected would reflect differences in environmental exposures. Results The top finding was related to the association between winter minimum temperature and schizophrenia. Additional significant geo-climate results were also observed with respect to bipolar disorder (sunny daylight), depressive symptoms (precipitation rate), major depressive disorder (precipitation rate), and subjective well-being (relative humidity). Beyond geo-climate variables, we also observed findings related to pathogen diversity and language phonological complexity: openness to experience was associated with protozoan diversity; conscientiousness and extraversion were associated with language consonants. Conclusions We report that common variation associated with psychiatric disorders and behavioral traits was affected by processes related to local adaptation in European populations. Electronic supplementary material The online version of this article (10.1186/s13073-018-0532-7) contains supplementary material, which is available to authorized users.


Background
Recent studies have used genome-wide data to investigate evolutionary mechanisms related to behavioral phenotypes, identifying widespread signals of positive selection (i.e., variants with beneficial effects on individual fitness increase in population frequency) in the predisposition to psychiatric disorder and behavioral traits [1][2][3]. Brainrelated phenotypes have undergone polygenic adaptation (adaptation that occurs by simultaneous selection on variants at many loci) during different phases of human evolutionary history [4] including to the present day [5]. This is consistent with several other investigations that found evidence of polygenic adaptation for predisposition to a wide range of complex traits [6][7][8][9]. These genome-wide signals of positive selection are the signatures of adaptation processes that occurred in response to environmental pressures. Single-variant analyses identified loci affected by local adaption (i.e., adaptation in response to selective pressure related to the local environment) to diet, pathogens, and geoclimate variables [10,11]. Polygenic mechanisms have also been observed in response to local environments. The observed difference in height between northern and southern Europeans appears to be related to a highly polygenic mechanism [12]. Polygenic risk scores (PRSs) for height, skin pigmentation, body mass index, type 2 diabetes, Crohn's disease, and ulcerative colitis were tested with respect to geo-climate variables in worldwide populations, with the discovery of putative signals of local adaptation [9]. However, a recent analysis demonstrated that PRSs derived from genome-wide association studies (GWASs) on populations of European descent generate biased results when applied to non-European samples [13]. PRS analysis should thus be limited to training and target datasets with the same ancestry backgrounds; we were therefore able to investigate local adaption only in European populations. To investigate whether molecular mechanisms at the basis of psychiatric/ behavioral traits (Table 1) were affected by local-adaptation processes that occurred during the colonization of Europe [14], we conducted a PRS analysis based on GWASs of psychiatric disorders and behavioral traits (Table 1) from the Psychiatric Genomics Consortium [15][16][17], the Genetics of Personality Consortium [18][19][20], and the Social Science Genetic Association Consortium [21] in a sample of 2455 individuals from 23 European populations. Then, we conducted a Gene Ontology (GO) enrichment analysis based on PRS results to provide information regarding the specific molecular mechanisms involved in the polygenic signatures of local adaptation observed.

Study population
The cohort used in the present study was previously investigated to analyze the genetic structure of European populations [22]. The sample included individuals from 23 different sampling sites located in one of 20 different European countries (Additional file 1: Table S1). The GeneChip Human Mapping 500K Array Set (Affymetrix) was used to genotype 500,568 single nucleotide polymorphisms (SNPs) according to the instructions provided by the manufacturer as reported previously [22]. The analysis of identity-by-state values permitted us to exclude the possibility of the presence of related individuals (i.e., individuals who were genetically more similar than expected to another member of the same subpopulation) and outliers (i.e., individuals who were far less genetically similar than expected to the rest of the subpopulation). We used this genotype information for imputation to maximize a consistent SNP panel between this cohort and the GWAS summary statistics used for the PRS analysis. Pre-imputation quality control criteria were minor allele frequency ≥ 1%, missingness per marker ≤ 5%, missingness per individual ≤ 5%, and Hardy-Weinberg equilibrium p > 10 −4 . We used SHAPEIT [23] for pre-phasing, IMPUTE2 [24] for imputation, and the 1000 Genomes Project reference panel [25]. We retained imputed SNPs with high imputation quality (genotype call probability ≥ 0.8), minor allele frequency ≥ 1%, missingness per marker ≤ 5%, and missingness per individual ≤ 5%. After applying the postimputation quality control criteria, we retained information regarding 3,416,230 variants in a final sample of 2455 individuals. Principal component analysis of the final sample was conducted using PLINK 1.9 [26] after linkage disequilibrium (LD) pruning (R 2 < 0.2) of the genotyped data. Principal components derived from genetic information were included in the regression model to adjust the analysis for population genetic background, which reflects the demographic history of European populations [27]. In line with previous PRS analyses [28][29][30][31][32], the initial analysis was conducted including the top 10 principal components. To verify whether residual population stratification affected our analysis, the top 20 principal components were included as covariates to confirm the reliability of the significant findings.

Local-adaptation variables
We extracted information regarding local adaptation by considering the location of the 23 sampling sites used to recruit the cohort investigated. Specifically, we considered three different types of variables: geo-climate (geographical coordinates, temperature, daylight, precipitation rate, and humidity), pathogen diversity (bacteria, protozoa, and virus), and language phonological complexity (consonants, segments, and vowels) ( Table 2). Geo-climate information was extracted from ClimaTemps (available at http:// www.climatemps.com/), which contains more than 12.5 million climate comparison reports providing information for more than 4000 locations worldwide. Data regarding pathogen diversity were extracted from the GIDEON (Global Infectious Diseases and Epidemiology Online Network) database (available at https://www.gideononline.com/). This includes information regarding 350 infectious diseases and 1700 microbial taxa in 231 countries. Information about the phonological complexity of European languages was extracted from PHOIBLE Online (available at http://phoible.org/), which is a repository of cross-linguistic phonological inventory data including 2155 inventories that contain 2160 segment types found in 1672 distinct languages [33]. Correlations among local-adaptation variables were estimated using Spearman's correlation test.

Polygenic risk score analysis
We conducted a PRS analysis using PRSice software [34] (available at http://prsice.info/). For polygenic profile scoring, we used summary statistics generated from multiple large-scale GWASs of psychiatric disorders and behavioral traits (Table 1) conducted by the Psychiatric Genomics Consortium [15][16][17], the Genetics of Personality Consortium [18][19][20], and the Social Science Genetic Association Consortium [21]. None of the GWASs used in the present study showed evidence of inflation due to population stratification or other possible confounders. Since none of the samples included in our target dataset was used in the GWAS considered to generate the PRS, no systematic overlap is expected between training and target datasets.
We considered multiple association p-value thresholds (PT = 5 × 10 −8 , 10 −7 , 10 −6 , 10 −5 , 10 −4 , 0.001, 0.01, 0.05, 0.1, 0.3, 0.5, 1) for SNP inclusion and calculated multiple PRSs for each trait investigated. The PRSs were calculated after using p-value-informed clumping with an LD cutoff of R 2 = 0.3 within a 500-kb window, and excluding the major histocompatibility complex region of the genome because of its complex LD structure. The PRSs that were generated were fitted in regression models with adjustments for the top 10 ancestry principal components. Before being entered into the analysis, local-adaptation variables were normalized using appropriate Box-Cox power transformations to avoid biases due to the distribution of the phenotypes tested. We applied a false discovery rate (FDR) correction (q < 0.05) to correct for the multiple testing for the psychiatric/behavioral PRS × localadaptation variables tested [35]. To verify that no systematic bias inflated our analyses, we also conducted a permutation analysis. Specifically, considering the significant datasets, we performed 10,000 permutations of the PRSs with respect to their associated variables and verified whether the observed differences were significantly different from the null distribution of the permuted results. To estimate the genetic correlation among psychiatric disorders and behavioral traits, we considered the information provided by LD Hub v1.3.1 [36] (available at http:// ldsc.broadinstitute.org/ldhub/) and used the LD score regression method [37] for the missing pair-wise comparisons. Heritability statistics of the GWAS considered are reported in Additional file 2: Table S2.

Gene Ontology enrichment analysis
To provide information regarding the molecular mechanisms involved in the signatures of local adaptation in psychiatric and behavioral traits, a GO enrichment analysis was conducted based on the PRS results; the variants included in the significant PRS and with nominally significant concordant direction with PRS direction were considered in the enrichment analysis. A description of the GO analysis based on PRS results was reported in previous studies [28][29][30]. Variants were then entered in the enrichment analysis performed using eSNPO [38]. This method permits one to conduct enrichment analysis based on information related to expression quantitative trait loci (eQTLs) rather than physical positions of SNPs and genes, integrating the eQTL data and GO, constructing associations between SNPs and GO terms, and then performing functional enrichment analysis. An FDR correction was applied to the enrichment results for multiple testing (q < 0.05). To validate the results further, we conducted a permutation analysis based on the variants obtained from the major depressive disorder (MDD)-altitude result (the one that gave the highest number of significant GO enrichments). Based on this SNP set, we generated 100 SNP sets using SNPsnap (available at https://data.broadinstitute.org/mpg/snpsnap /match_snps.html) [39] and the following matching criteria: minor allele frequency ± 5%, gene density ± 50%, distance to nearest gene ± 50%, LD independence (R 2 = 0.3) ± 50%. The SNP sets generated were entered in the eSNPO analysis and the distribution of their results compared with those obtained from the SNP sets from the PRS analyses.

Natural and Orthogonal InterAction (NOIA) model
The NOIA model [40] was applied to validate the results related to single-locus and oligogenic signals identified by our PRS analysis. NOIA is able to estimate the interaction between genes (or epistasis), which is a key process in determining the effect of genomic variants in complex diseases and the adaptation and evolution of natural populations [41]. We performed NOIA analysis testing the genotypes of the variants included in the significant PRSs with respect to local-adaptation variables identified. The NOIA analysis was conducted using the R package noia (available at https://cran.r-project.org/ web/packages/noia/index.html).

Data sources
Data supporting the findings of this study are available within this article and its additional files. GWAS summary association data used to calculate PRSs in this study were obtained from the Psychiatric Genomics Consortium (available at https://www.med.unc.edu/pgc/ results-and-downloads/), the Genetics of Personality

Results
As expected, the set of variables related to the local environment were strongly intercorrelated ( Fig. 1; Additional file 3: Table S3). Similarly, psychiatric disorders and behavioral traits showed strong genetic correlations ( Fig. 2; Additional file 4: Table S4). We considered multiple GWAS significance thresholds to test PRSs [34], investigating both oligogenic and polygenic mechanisms (i.e., local-adaptation processes affecting few and many loci, respectively). To adjust our analysis for population genetic background, which reflects the demographic history of European populations [27], we included the top 10 principal components reflecting population ancestry variation as covariates in the regression models. This approach was considered on the basis of the experience of many GWAS and PRS analyses conducted on samples containing populations of different European descents. The use of 10 principal components is generally considered a standard approach to adjust within ancestry population stratification. However, to demonstrate that our findings are not due to the genetic relationships among European populations, we recalculated the significant PRS results (Table 3) considering 20 principal components in the regression models, and then tested for differences with respect to the original model: we did not observe significant differences between the two models (Additional file 5: Table S5).
Considering the results that survived FDR multiple testing correction (q < 0.05; Additional file 6: Table S6), we observed 13 variables related to local adaptation: 11 geo-climate variables, one related to pathogen diversity, and one related to language phonological complexity. Table 3 reports the top associations that survived FDR multiple testing correction for each of these 13 localadaptation variables. Figure 3 reports full visualization of the results for all comparisons (psychiatric/behavioral PRS × local-adaptation variables). We confirmed the reliability of the significant results empirically by generating a null distribution from 10,000 permutations of the original datasets and comparing the permuted results with the observed ones (Additional file 7: Figure S1). Since polygenic signatures of local adaptation have previously been reported in height genetics of European populations [12], we used this trait as a positive control for our approach. With this analysis, we replicated the presence of adaptation signals in the genetics of this trait (p < 0.05; Additional file 8: Table S7).
The strongest result was observed between the schizophrenia (SCZ) PRS and winter minimum temperature (WinMinTemp): higher WinMinTemp correlates with increased SCZ genetic risk (SNP N = 104,106, Nagelkerke's R 2 = 0.40%, Z = 3.84, p = 1.28 × 10 −4 , q = 0.029). Higher WinMinTemp was also associated with increased MDD PRS (SNP N = 8160, Nagelkerke's R 2 = 0.30%, Z = 3.34, p = 8.46 × 10 −4 , q = 0.029) and increased extraversion PRS (SNP N = 7, Nagelkerke's R 2 = 0.26%, Z = 3.14, p = 1.75 × 10 −3 , q = 0.037). While the MDD result is concordant with the SCZ-MDD genetic correlation, the extraversion finding seems to be independent of the SCZ and MDD results. The SCZ PRS was also associated with winter maximum temperature (WinMaxTemp) and longitude; the three environmental variables are highly correlated, and the results are driven by the same mechanism related to winter temperature. Covarying these three local-adaptation variables, WinMaxTemp appears to be the driving signal among the correlated results (p < 0.05; Additional file 9: Table S8).  Table 1 and Table 2. Additional file 3: Table S3 reports details of the correlation analysis. Asterisks (*) indicate correlations surviving Bonferroni multiple testing correction. Yellow, purple, and cyan colors indicate variables related to geo-climate, pathogen diversity, and language phonological complexity, respectively. Hierarchical clustering based on Spearman's rho was generated considering absolute correlation distances To understand better the molecular processes involved in this association, we conducted a GO enrichment analysis based on the PRS result. We observed 16 GOs that survived FDR multiple testing correction (q < 0.05; Additional file 10: Table S9). Among the other significant PRS associations, we observed significant GO enrichments (N = 54; Additional file 11: Table S10) in the negative association between altitude and MDD PRS (SNP N = 97,481, Nagelkerke's R 2 = 0.31%, Z = −3.13, p = 1.79 × 10 −3 , q = 0.037) only. Five GO enrichments are significant in both SCZ and MDD analyses (GO:0008285~negative regulation of cell proliferation, GO:0017147~Wnt-protein binding, GO:2000041~negative regulation of planar cell polarity pathway involved in axis elongation, GO:0071481~cellular response to Xray, and GO:0090244~Wnt signaling pathway involved in somitogenesis); two of these are related to the Wnt signaling pathway. To confirm empirically that these  Table S4 reports details of the correlation analysis. Abbreviations are reported in Table 1 and Table 2. Asterisks (*) indicate correlations surviving Bonferroni multiple testing correction. Green and orange colors indicate psychiatric disorders and behavioral traits, respectively. Hierarchical clustering based on genetic correlation was generated considering absolute correlation distances enrichment results are not false positives, we conducted a permutation analysis: we generated 100 random sets of LD-independent variants derived from the SNPs included in the MDD analysis (which was the one that gave the highest number of GO enrichments), considering minor allele frequency, gene density, distance to nearest gene, and LD independence as matching criteria. There was no permuted set with more than two significant GO enrichments (i.e., the empirical probability to observe a random set with more than two significant GO enrichments is p < 0.01; Additional file 12: Figure S2); the overall probability to observe a significant GO enrichment from a permuted set is p = 6.69*10 −5 (Additional file 13: Figure S3); and none of the four GOs shared by SCZ and MDD results resulted in significance in the permuted sets (q > 0.18).

Discussion
There are many datasets available with information regarding positive selection signatures in reference European populations [43,44]. We previously used these available data, observing a significant enrichment for positive selection in the genetics of psychiatric disorders [1]. Comparable results have been observed by independent groups using different approaches [2,3]. Our current analysis provides novel data with respect to local-adaptation differences among European populations. Indeed, considering positive selection signals in a reference European population, the signatures of positive selection are those shared by European populations and those specific for that particular population. With local-adaptation analysis, we are investigating the differences in selective pressures among a set of distinct European populations. Thus, the signals detected in the reference population may not overlap with those related to the local-adaptation mechanisms. To be able to use tests for positive selection (e.g., haplotype-  Fig. 1 and Fig. 2, respectively. Abbreviations are reported in Table 1 and Table 2. Additional file 6: Table S6 reports the summary statistics of the PRS analysis based methods), we would need a larger sample in each of the populations considered. Our PRS analysis identified 20 associations that survived FDR multiple testing correction (Additional file 5: Table S5). The specific characteristics of the sample investigated may generate false positive results due to several factors (e.g., different sample sizes at the different populations and non-random spatial sampling). However, our permutation analysis of the significant PRS results (i.e., we permuted the genetic scores with respect to the environmental variables) indicated that there is little possibility of bias due to the composition of the sample investigated.
Our findings appear to indicate that psychiatric and behavioral traits are not necessarily the outcomes selected by evolutionary pressures; some of the molecular pathways involved in their predisposition were affected by local adaptation. We observed some convergence between our local-adaptation findings and known epidemiological evidence. However, our findings should be related to evolutionary forces that acted on a population level, while epidemiological evidence should be due to mechanisms that acted on an individual level. We hypothesize that evolutionary forces shaped the genetic diversity of European populations, while individual-level changes should be due to post-genetic changes (e.g., epigenetic modifications) or the interaction of social-psychological risk factors on loci affected by local adaptation.
The strongest result observed between the SCZ PRS and WinMinTemp is in line with previous epidemiological studies. Season of birth is a widely recognized SCZ risk factor, where there is significantly increased risk associated with winter birth [45]. Our current finding may justify a molecular hypothesis: loci associated with increased SCZ risk may have undergone local adaptation related to winter conditions. The same environmental pressure may be responsible for the winter-birth risk through epigenetic mechanisms in line with the convergence between regional DNA methylation changes and signals of local adaptation reported for other loci [46]. Our GO enrichment analysis highlighted Wnt signaling as one of the molecular processes affected by this local-adaptation mechanism. This biological pathway is well studied in relation to both psychiatric disorders and human evolution; synaptic Wnt signaling is implicated as a possible contributor to several major psychiatric disorders due its involvement in neural differentiation processes [47]. Signatures of positive selection were reported in relation to the Wnt signaling pathway in multiple species [48]. Our present findings indicate that risk loci for psychiatric disorders involved in this molecular pathway could have been under local adaptation in European populations.
Another result in line with a known epidemiological association is the negative association between maximum sunny daylight period and BD (bipolar disorder) PRS. Seasonality of BD symptoms is common and, in particular, light exposure during early life may have important consequences for those who are susceptible to bipolar disorder [49]. More generally, lack of daylight is implicated in mood change in seasonal affective disorder [50]. Our finding indicates that daylight may have acted as a local selective pressure with respect to molecular pathways involved in BD pathogenesis.
As mentioned above, we also observed some instances of local adaptation involving oligogenic and single-locus signals. Although top results from GWASs of psychiatric and behavioral traits do not explain a large percentage of the variance, loci surviving stringent significance cutoffs usually show larger effect sizes, suggesting that they may be involved in key mechanisms involved in the pathogenesis of the traits investigated. Among the oligogenic signals, the strongest finding is the association of OPEN PRS, including the top two associated variants (rs1477268 and rs10932966), with protozoa diversity and summer minimum temperature. These two results appear concordant with the strong positive correlation between summer minimum temperature and protozoa diversity (Spearman's rho = 0.75, p = 4.51 × 10 −5 ), which is consistent with the relationship between temperature and pathogen diversity [51]. rs1477268 is located near RAS1, which was implicated by previous studies as being involved in pathogen response [52]. From GTEx data [42], rs10932966 is significantly associated with RP11-16P6.1 gene expression in multiple human tissues (Additional file 16: Table S12), but no information regarding its function is available. We hypothesize that these loci have been under local selective adaption in response to pathogenrelated selective pressure. This is in line with the consistent literature regarding the role of selective pressures induced by pathogen diversity in shaping human genome diversity [6].
Another single-locus result was observed between rs6992714, which is associated with DS risk, with latitude and summer maximum temperature. This genetic variant is associated with GGH gene expression, which was previously implicated as involved in the pathogenesis of tropical sprue, a malabsorption syndrome commonly found in tropical regions [53]. According to our data, GGH may have been under local adaptation in relation to selective pressures induced by summer temperatures. The associations discussed appear to be related to the effect of selective pressures induced by geoclimate and pathogen-related variables on the human genome.
The relationship between genetic and language diversities has been investigated from several perspectives [54], and genetic associations with language phonological complexity require careful consideration. Our data indicate that there is at least a partial relationship between genetic variation and language diversity that is not driven by their shared association with human demographic history (which should be reflected by the genetic diversity accounted for by the adjustment for principal components derived from genetic data). This supports two possible converse scenarios: (1) genetic variation may have contributed to shape European language diversity; (2) European language diversity may have been a local selective pressure that shaped the genetics of behavioral traits. Although it is not possible to establish causality or a mechanism based on our current data, phonological working memory appears to be associated with extraversion and conscientiousness [55], in agreement with the relationship highlighted by our results.