- Open Access
Universal clinical Parkinson’s disease axes identify a major influence of neuroinflammation
Genome Medicine volume 14, Article number: 129 (2022)
There is large individual variation in both clinical presentation and progression between Parkinson’s disease patients. Generation of deeply and longitudinally phenotyped patient cohorts has enormous potential to identify disease subtypes for prognosis and therapeutic targeting.
Replicating across three large Parkinson’s cohorts (Oxford Discovery cohort (n = 842)/Tracking UK Parkinson’s study (n = 1807) and Parkinson’s Progression Markers Initiative (n = 472)) with clinical observational measures collected longitudinally over 5–10 years, we developed a Bayesian multiple phenotypes mixed model incorporating genetic relationships between individuals able to explain many diverse clinical measurements as a smaller number of continuous underlying factors (“phenotypic axes”).
When applied to disease severity at diagnosis, the most influential of three phenotypic axes “Axis 1” was characterised by severe non-tremor motor phenotype, anxiety and depression at diagnosis, accompanied by faster progression in cognitive function measures. Axis 1 was associated with increased genetic risk of Alzheimer’s disease and reduced CSF Aβ1-42 levels. As observed previously for Alzheimer’s disease genetic risk, and in contrast to Parkinson’s disease genetic risk, the loci influencing Axis 1 were associated with microglia-expressed genes implicating neuroinflammation. When applied to measures of disease progression for each individual, integration of Alzheimer’s disease genetic loci haplotypes improved the accuracy of progression modelling, while integrating Parkinson’s disease genetics did not.
We identify universal axes of Parkinson’s disease phenotypic variation which reveal that Parkinson’s patients with high concomitant genetic risk for Alzheimer’s disease are more likely to present with severe motor and non-motor features at baseline and progress more rapidly to early dementia.
A critical challenge in medicine is to understand why the clinical presentations of each patient affected by the same disorder vary. This is especially true for Parkinson’s disease, for which the age of onset, the rate of progression, type and severity of symptoms differ across more than a million people worldwide living with this disease . To accelerate the identification of disease subtypes, large deeply phenotyped cohorts of Parkinson’s disease patients have been created, in which valuable clinical, imaging, biosample and genetic data have been collected, increasingly with longitudinal monitoring [2,3,4].
Recent studies exploiting these deeply phenotyped cohorts have classified patients into discrete phenotypic subgroups, each displaying a characteristic set of symptoms [5,6,7]. To define Parkinson’s disease subtypes, most of these studies employ some form of variable selection to create a distance matrix between individuals, followed by clustering methods such as k-means or hierarchical clustering. These methods provide discrete phenotypic groups, which are appealing in their categorical nature but have many shortfalls. Firstly, while selection methods quantify how much variance each phenotype explains, no robust method has defined a threshold for this measure above which a phenotype contributes to the distance matrix. Consequently, the definition of which phenotypes are essential to classify patients, and which are irrelevant can be somewhat arbitrary. For example, two recent studies [5, 8], using the same Parkinson’s Progression Markers Initiative (PPMI) cohort show divergent results: apathy and hallucinations were key subtype classifiers in the first study , but not in the second one , because these variables were not included. Secondly, K-means clustering requires the number of phenotypic groups to be prespecified, and this choice has the potential to be biased towards preconceived expectations with smaller groups ignored or erroneously joined with larger groups. Two studies using a k-means approach and the same cohort came to different conclusions. Lawton et al. (2015)  and Lawton et al. (2018)  identified five and four clusters, respectively, with some individuals previously in the same cluster moving to different clusters. This discrepancy reflects that the optimal number of clusters is not trivial to select and different statistics used to decide on optimal numbers often disagree. Finally, the creation of discrete groups may not reflect the possibly continuous nature of phenotypic variability and ignores the greater statistical power of continuous traits.
To overcome these limitations, we propose here an approach focused on the continuous variation of phenotypes. For this, we applied PHENIX (PHENotype Imputation eXpediated), a multiple phenotype mixed model (MPMM) approach initially developed to impute missing phenotypes , that is employed here to perform genetically-guided dimensionality reduction of multiple clinical traits. This approach models the phenotypes as a combination of genetic and environmental factors, and the genetic component exploits the genetic relatedness between patients.
Applying PHENIX to the deeply phenotyped UK-based Oxford Discovery cohort [4, 6], we identify a small number of axes underlying individual Parkinson’s disease patient phenotypic variation that explain the variation in the much larger number of clinically-observed phenotypes. We demonstrate the universality of these axes of phenotypic variation amongst Parkinson’s disease patients by independently deriving similar axes in all three deeply phenotyped cohorts, namely Tracking UK cohort , the UK Oxford Discovery cohort [4, 6] and lastly the US Parkinson’s Progression Markers Initiative (PPMI) cohort that has a different clinical structure from the UK cohorts. We show that this reproducibility is not achieved by other commonly-used dimensionality-reduction methods and the utility of a genetic component. Finally, we demonstrate that the most influential phenotypic axis was associated with the genetic risk of Alzheimer’s disease and microglia-specific gene expression, suggesting Parkinson’s disease patients with a high genetic risk for Alzheimer’s disease are more likely to develop an aggressive form of Parkinson’s disease including dementia symptoms.
Oxford Discovery cohort
We considered 842 Parkinson’s disease cases from the Oxford Discovery cohort [4, 6]. Individuals were required to have at least 90% chance of Parkinson’s disease according to UK-Parkinson’s disease brain bank criteria, no alternative diagnosis and disease duration less than 3.5 years. All patients had a clinical assessment repeated every eighteen months and have been already described [4, 6]. Phenotype data were collected for over a hundred clinical attributes, affecting autonomic, neurological and motor phenotypes (Additional file 1: Fig S1) and described in the Additional file 2: Table S1. Genotype data were generated using the Illumina HumanCoreExome-12 v1.1 and Illumina InfiniumCoreExome-24 v1.1 SNP arrays. To access to the clinical data of the Oxford Discovery cohort [4, 6], researchers must apply to the Oxford Parkinson’s Disease Centre (OPDC).
Tracking UK Parkinson’s study
We considered 1807 Parkinson’s disease cases from the Tracking UK Parkinson’s cohort, which was already described in detail by Malek et al.  Genotype data were generated using the Illumina Human Core Exome array. To access to the clinical data of the Tracking UK cohort, researchers must contact Dr Donal Grosset (email@example.com).
Parkinson’s Progression Markers Initiative cohort
The PPMI cohort (http://www.ppmi-info.org) was already described in detail (including the PPMI protocol of recruitment and informed consent) by Marrek et al. . We downloaded data from the PPMI database on January 2021 in compliance with the PPMI Data Use Agreement. We considered 472 newly-diagnosed Parkinson’s disease subjects: subjects with a diagnosis of Parkinson’s disease for two years or less and who are not taking Parkinson’s disease medications. We used the baseline (t = 0) (Additional file 2: Table S2) and the follow-up of clinical assessments. We excluded any individual with > 5% of missing data (437 individuals included). Participants have been genotyped using the NeuroX chip [11, 12]. PPMI data are available to the research community on the PPMI website: www.ppmi-info.org.
Clinical score of severity at diagnosis and progression
As the clinical measures can be confounded by differences in disease stage and medication status, an estimate of the disease severity at diagnosis as well as a measure of disease progression for each individual was derived with linear mixed effect models (LMM) by adjusting for different covariates. LMMs were chosen as they handle longitudinal data, i.e. non-independent data, allow for missing values and a flexible modelling of time, and can estimate individual trends . For PPMI , Oxford Discovery [4, 6] and Tracking UK . longitudinal data for several clinical tests was available (Tables S1 and S2). The clinical tests are recoded such that higher values indicate worse performance (multiplied by − 1) and standardised. For each clinical test, we fitted a LMM. We consider an intercept and the time since diagnosis as random effects such that for each individual we get an estimate for the severity at diagnosis and the progression respectively. We further included sex (categorical), education years (standardised), and age at diagnosis (standardised) as fixed effects. Our final model can be described as: clinical_assessment ~ 1 + C(sex) + education_years + age_at_diagnosis + time_since_diagnosis + (1 + time_since_diagnosis | subject_id). In both cohorts, clinical tests were performed both medicated and not medicated (Additional file 1: Fig S2). We therefore include an additional fixed effect indicating medication usage (categorical). Inclusion criteria are having less than 5% of missing data and at least two visits for a clinical test (decided for each clinical test individually) and having no missing information for the random and fixed effects. For PPMI 472 subjects with a median of 12 visits spanning over a mean of 5.59 years after diagnosis are included and for Oxford Discovery 876 subjects with a median of 4 visits spanning over a mean of 4.38 years after diagnosis are included. The goodness of fit was estimated as the R2 (e.g. for UPDRS III Additional file 1: Fig S3 or Additional file 2: Table S3). We noted that are not perfectly normally distributed as illustrated in Additional file 1: Fig S4. Nevertheless, we did not observe a significant improvement of goodness after box-blot data transformation so that it meets the assumption of normality. While we applied linear mixed models here, we acknowledge the current debate over whether linear or non-linear mixed models best model the data [14, 15]. Individual estimates for disease severity at diagnosis and disease progression can be extracted from the random effects. These measures are used for further analyses. The LMMs were fitted with pymer4 0.7.1and the model comparison was done with scikit-learn 0.23.2.
Genotype: quality control and imputation
Quality control was carried out independently using PLINK v1.9 . Imputation of unobserved and missing variants was carried out separately for each cohort (Supplemental Material).
Our continuous measures of severity are based on a multiple phenotypes mixed model (MPMM) approach named PHENIX (PHENotype Imputation eXpediated) which includes genetic relationships between individuals and was designed to impute missing phenotypes . To impute missing phenotypes, PHENIX reduces the variation within a cohort to a smaller number of underlying factors that are then used to predict individual missing values. Here, we exploit the identification of these underlying factors as providing the latent axes of patient variation which underlie a larger number of clinically observed phenotypes (Fig. 1A). The outcome is that the many clinical phenotypes (sometimes missing for some individuals) of each individual are represented through, i.e. their variances may be well explained by, a smaller number of underlying latent variables of phenotypic variation, which we name herein as phenotypic axes.
PHENIX  employs a Bayesian multiple-phenotype mixed model (MPMM), where the correlations between clinical phenotypes (Y) are decomposed into a genetic and a residual component with the following model: Y = U + e, where U represents the aggregate genetic contribution (whole genotype) to phenotypic variance and e is idiosyncratic noise. As the estimation of maximum likelihood covariance estimates can become computationally expensive with an increasing number of phenotypes, PHENIX uses a Bayesian low-rank matrix factorization model for the genetic term U such that: U = Sβ, in which β is can be used to estimate the genetic covariance matrix between phenotypes and S represents a matrix of latent components that each follow ~ N (0,G) where G is the Estimate of Relatedness Matrix from genotypes. The resulting latent traits (S) are used here as phenotypic axes, each representing the severity of a number of non-independent clinical phenotypes. The details to run PHENIX and extract the phenotypic axes are given in the Supplemental Material.
Risk-guided phenotypic axis
We derived a risk-guided phenotypic axis by replacing the whole-genotype-relatedness genetic component in our MPMM by the genetic relatedness focused upon a specific disease/trait (Fig. 1B). To calculate a disease-relatedness matrix, we recalculated relatedness between individuals using only those genetic variants (after pruning) with a genome-wide association study (GWAS) association < 0.05, and repeated at < 0.1, with a given human complex trait. For different complex human traits with GWAS results publicly available (Additional file 2:Table S4), we calculated a disease relatedness genotypic similarity matrix between patients that we used subsequently to derive phenotypic axes (Additional file 1) We determined statistical significance associated with any increase in the phenotypic variation explained by the new, risk-guided, phenotypic axes by deriving phenotypes with random SNPs sets matching the number of SNPs (after pruning) with p-value < 0.05 (or < 0.01). We calculated an empirical p-value by comparing the phenotypic variation explained by the risk-guided, phenotypic axes with the phenotypic variation explained by random SNPs’ phenotypic axes.
Conditional risk-guided phenotypic axis type association analysis
To evaluate whether the subset of genetic variants associated with a specific disease that influences a phenotypic axis overlap with those influencing another disease, we performed GWA conditional analysis with multi-trait-based conditional and joint analysis (mtCOJO) . We recalculated the genetic similarity with the summary GWA statistic of a trait conditioned for those of another trait and derived the new conditional risk-guided phenotypic axis. After this conditional analysis, we then examined whether the proportion of the phenotypic variance explained decreased or not: a decreasing proportion suggests that overlapping genetic variants of the two traits were associated with the same phenotypic axis.
Cell type association analysis
With the same approach and dataset described by Agarwal et al. (2020) , we examined the intersection between Substantia nigra (SN) cell type-specific gene expression patterns and the genetics influencing the phenotypic axes to identify disease-relevant cell types in the brain. We performed these cell type association analyses using MAGMA .
Microglia-specific module analysis
We used the same approach and the same dataset as Agarwal et al. (2020) . Briefly, a microglia-specific protein–protein interaction (PPI) network is built by identifying PPIs between genes highly expressed in the SN microglia. We then identified modules of highly interconnected genes in a microglia type-specific PPI network using the “cluster_louvain” function in “igraph” R package . To functional annotate each module, we performed Gene Ontology (GO) enrichment analysis with topGO R Bioconductor package  by testing the over-representation of GO biological processes (GO BP) terms within the module gene sets using Fisher’s test. rrvgo R Bioconductor package  was used to summarise the top 100 enriched GO BP terms into a smaller number of representative terms.
Three continuous measures capture 75% of the clinical variation
Examining first a cohort of 842 Parkinson’s disease patients (Oxford Discovery cohort [4, 6]) which had been genotyped and phenotypically characterised with 40 clinical assessments (Additional file 2: Table S1), we applied the PHENIX MPMM method to identify underlying latent continuous phenotypic axes that could account for the observed clinical variation. Each phenotypic axis reflected a number of co-varying observed clinical assessments. Three phenotypic axes explained more than 75% of the clinical variation, specifically Axes 1, 2 and 3 explained 39.6%, 28.7% and 6.8% of the variation respectively (Fig. 2 and Additional file 1: Fig S5). To examine whether similar phenotypic axes are obtained in different deeply phenotyped Parkinson’s disease cohorts, we derived phenotypic axes within an independent cohort of 1807 Parkinson’s disease individuals from the Tracking UK cohort  that had made similar clinical observations to the Oxford Discovery cohort [4, 6]. We found significant Pearson’s correlation coefficients between each cohort’s first three phenotypic axes: Axis 1 r = 0.92 (p = 3 × 10−13), Axis 2 r = 0.89 (p = 4 × 10−11), Axis 3 r = 0.72 (p = 5 × 10−6) (Fig. 2). Nevertheless, a major concern was that the identification of the same phenotypic axes might, at least in part, be due to the very similar structure of the clinical phenotyping between the two UK cohorts. To address this, we examined the independent US-based PPMI cohort  consisting of 439 sporadic Parkinson’s disease individuals that had been clinically phenotyped following a substantially different protocol to the UK cohorts. After deriving phenotypic axes in the PPMI cohort , we found significant similarities between the first three phenotypic axes derived for both Oxford Discovery [4, 6] and PPMI  cohorts: the coefficients of determination (R^2) between three first axes across different categories of clinical phenotypes from each cohort were: Axis1: 0.665 (p = 0.048), Axis 2: 0.914 (p = 0.003) and Axis 3: 0.754 (p = 0.025) (Fig. 3 and Additional file 1: Fig S6). By deriving phenotypic axis in three cohorts by using only UPDRS I, II, III and MOCA, four clinical measures systematically recorded in each cohort, we found significant similarities between the two first phenotypic axes derived in three cohorts: correlation between phenotypic axis vs clinical measure between Oxford Discovery cohorts (x-axis) vs others cohorts r = 0.92 95% [0.81–0.97]. These consistent similarities in the axes of phenotypic variation independently derived for each of three different Parkinson’s disease cohorts demonstrates the universality of these axes of phenotypic variation amongst Parkinson’s patients. Finally, by comparing PHENIX with other methods of dimensionality reduction for the UK/US cohort comparisons, specifically principal component analyses (PCA), multidimensional scaling (MDS) and independent component analysis (ICA), only the phenotypic dimensions discovered by the genetically-guided MPMM model, PHENIX, were significantly correlated between both cohorts. Hence, no other method was able to identify similar axes of phenotypic variation across UK and US Parkinson’s disease cohorts (Fig. 3).
Each phenotypic axis represents a distinct set of clinical features
To interpret the clinical relevance of each phenotypic axis, we examined the correlation between individual clinical features and the phenotypic axes (Table 2 and Additional file 1: Fig S2 and Additional file 1: Fig S7). We observed that each phenotypic axis corresponded to a subset of clinical features, differing in both extents and directions of severity. Axis 1 represented worsening non-tremor motor phenotypes, anxiety and depression accompanied by a decline of the cognitive function (Table 2). Worsening anxiety and depression were also features of Axis 2, in addition to increasing the severity of autonomic symptoms and increasing motor dysfunction. Axis 3 was associated with general motor symptom severity including rigidity, bradykinesia and tremor of the whole body independently of non-motor features. The contribution of different phenotypes to these axes was therefore highly variable. Specific aspects of motor dysfunction were important factors in defining the majority of axes. Anxiety and depression were also relatively important features, but only for axes explaining the largest amounts of variation. Conversely, cognitive impairment was associated only with Axis 1. However, this observation must be weighted by the fact that cognitive impairment/dementia is reported at a later disease stage and thus features less in recently diagnosed cases.
Although each phenotypic axis is associated with a distinct set of clinical features, they are not independent but instead strongly correlated (Additional file 1: Fig S8). We find no significant relation between the phenotypic axes and principal components of genetic ancestry (Methods) suggesting that the phenotypic axes are not biased by the population structure (Fig S9, Additional file 2 :Table S5). However, as previously reported, gender influences clinical symptoms  and we also observe a significant association between gender and Axis 2 (Table S5, p = 4.5 × 10−5).
To assess to what extent the phenotypic axes might be affected by the number of clinical observations, within the Oxford Discovery cohort [4, 6] we compared the phenotypic axes built on all clinical features with phenotypic axes generated with incomplete sets of randomly-selected clinical features. We observed a strong correlation (r > 0.8) between each of the two first phenotypic axes built with as few as 50% of the clinical variables and their respective original phenotypic axes, suggesting that these two axes are extremely robust in terms of the numbers of clinical variables considered (Additional file 1: Fig S9). Finally, the agreement of these phenotype axes with previously observed correlations provides further support for underlying biological themes, but their reinterpretation as robust continuous traits likely provides a more realistic approximation of how the underlying biology contributes, as opposed to a clustering-based cut-off for a phenotype. Specifically, the unimodal distribution of patients along these phenotypic axes (Additional file 1: Fig S10 and S11) suggests here that the development of continuous measures is more appropriate than clustering according to an arbitrary threshold.
The integration of genetic relationships improves the capture of the clinical symptoms
The PHENIX MPMM approach employed here to derive phenotypic axes exploits the genetic relatedness between individuals derived from genotypic similarity to further decompose random effects into kinship effects between individuals. In its original application to imputing missing phenotypes, PHENIX outperforms other imputation approaches when the heritability (h2) of a phenotype increased . Similarly, when randomly removing and re-imputing 10% of observed data, the quality of the imputation of Parkinson’s disease clinical assessments was in general better when considering the genetic relatedness between individuals as compared to excluding this information (Additional file 1: Fig S12), suggesting that phenotypic axes better capture Parkinson’s disease heterogeneity when including genetic information. Moreover, we found a higher agreement between the phenotypic axes derived by integrating the genetic relationship between patients of different cohorts than when the phenotypic axes were derived ignoring the genetic relationships (Additional file 1: Fig S13). Specifically, the coefficient of determination reflecting the agreement between the axes derived from Oxford Discovery [4, 6] and those derived from the PPMI  cohorts were from Axis 1 to 3, respectively: 0.665 (p = 0.048), 0.914 (p = 0.003) and 0.754 (p = 0.025) when including the genetic similarity between patients as compared to 0.604 (p = 0.069), 0.908 (p = 0.003) and 0.001 (p = 0.991) without. Together, these findings demonstrate that including genetic relationships between patients enhances the resulting phenotypic axes’ ability to reproducibly capture Parkinson’s disease clinical variation.
A high Alzheimer’s genetic score increases the risk of developing a more severe Parkinson’s form
To better understand the genetic risk factors influencing the phenotypic axis, we replaced the pairwise patient overall genotypic similarity matrix in the MPMM with a similarity matrix based only on regions of the genome associated with a specific complex human trait/disease. For example, replacing the overall genetic similarity with how similar people are in their genetic risk for diabetes or depression. We then rederived the phenotypic axes using the new metric of genetic similarity and compared the proportion of phenotypic variation explained by the new phenotypic axes, derived from different disease risks, to the original phenotypic axes that were derived using the entire genotype (Methods). Unexpectedly, the phenotypic axes derived using Parkinson’s disease genetic risk performed no better than the original phenotypic axes, while axes derived using the genetic risk for Alzheimer’s disease or the risk for inflammatory bowel disease, ulcerative colitis significantly outperformed, i.e. captured more patient phenotypic variation than, the original principal phenotypic Axis 1 (Fig. 4A and Additional file 1: Fig S14-15). Although UC and inflammatory bowel disease share a common genetic aetiology , we find no evidence that the same risk variants influence Alzheimer’s disease, suggesting that two distinct molecular aetiologies underlie phenotypic Axis 1. Specifically, we see no significant reduction in the variance explained by the axis calculated using Alzheimer’s disease genetics variants conditioned on ulcerative colitis or inflammatory bowel disease genetics variants (Additional file 1: Fig S16). The APOE locus is one of the major risk loci in Alzheimer’s disease, but we found no evidence that Parkinson’s disease individuals carrying one of two APOE ε4 alleles have a significantly higher phenotypic Axis 1 score suggesting that the APOE locus is not a major risk locus influencing Parkinson’s disease clinical presentation (Additional file 1: Fig S17).
Our results imply that Parkinson’s disease patients with high genetic risk for Alzheimer’s disease, but excepting APOE, are more likely to develop a more aggressive form of Parkinson’s disease that includes dementia symptoms as indicated by Axis 1, which represents worsening non-tremor, motor phenotypes, anxiety and depression accompanied by a decline in cognitive function (Table 2 and Fig. 2). We tested this hypothesis in the PPMI cohort  and found a significant relationship between phenotypic Axis 1 and the cerebrospinal fluid (CSF) Aβ1-42 level (r2 = 0.43, p = 0.007), an Alzheimer’s-associated biomarker strongly associated with future conversion to dementia, but no correlation was observed with total Tau, phosphorylated Tau or Alpha-Synuclein levels (p > 0.05). Parkinson’s disease patients with a high score for phenotypic Axis 1 had a significantly lower CSF level Aβ1–42 [24, 25] (Fig. 4B).
To identify the pathways underlying the genetics of this major clinical axis, we conducted a meta-analysis for the genome-wide association study (GWAS) summary statistic of 4,211,937 variants across 3088 individuals from three cohorts (Oxford Discovery [4, 6], Tracking UK  and PPMI ) (Method). In line with Alzheimer’s disease risk genetics rather than Parkinson’s disease, we found an association between Phenotypic Axis 1 risk variants and both the SN and cortex microglia-specific genes, which indicates that neuro-inflammation plays a key role in the development of a more aggressive form of Parkinson’s disease (Fig. 4A). Again, following the approach of Agarwal et al.  we examined the intersection of genetic risk and microglia-specific functions by identifying highly connected modules of microglia-specific genes whose proteins interacted (Methods) and then used MAGMA to associate these functional modules with different genetic risks. Modules were then annotated with GO terms and corrected for the microglial gene expression background. The genetic risk influencing both Alzheimer’s disease and phenotypic Axis 1 converge to microglia-specific gene Module 2 (Bonferroni adjusted p-value for the number of modules Alzheimer’s disease p = 0.038 – Axis 1 p = 0.042), expressing proteins involved in phagocytosis and regulation of immune response (Fig. 5B).
As for Alzheimer’s disease, we also found a metabolic influence on the Parkinson’s disease phenotype. We observe that patients with a history of high cholesterol or a history of heart failure, stroke and/or heart attack score significantly higher only on phenotype Axis 1 than those without these histories (Additional file 1: Fig S18). We also observe a significant positive correlation between patient BMI and only their Phenotypic Axis 1 severity score (r = 0.22; p = 3.8e − 06; Additional file 1: Fig S19).
A higher Alzheimer’s genetic risk increases the risk to develop a faster progressing form of Parkinson’s
In the Oxford Discovery [4, 6], Tracking UK , and PPMI  cohorts, we used their available repeated clinical evaluations to measure individual variation in disease progression. For each clinical phenotype, we derived a progression measure noting that the interval and span of clinical follow-ups varied between the three cohorts (Methods): on average 3.6, 3.6, 11, spanning 4.37, 4.08, and 5.58 years in the Oxford Discovery [4, 6], Tracking UK  and PPMI  cohort, respectively. The average interval between visits was 20, 19 and 6 months in the Oxford Discovery [4, 6], Tracking UK  and PPMI  cohort, respectively. We derived the longitudinal phenotypic axes by using the progression measure of each clinical phenotype. We identified a primary axis explaining 76%, 72% and 78% of the clinical progression in the Oxford Discovery [4, 6], Tracking UK  and PPMI  cohorts, respectively. This axis was firstly correlated with UPDRS III clinical scores for motor symptoms (r2ppmi = 0.83 & r2Tracking = 0.74, r2Discovery = 0.66) and with MOCA scores for the cognitive dysfunctions (r2ppmi = 0.69 & r2Tracking = 0.65, r2 Discovery = 0.69). Phenotypically, this axis was significantly similar in three cohorts (Oxford Discovery/PPMI r2 = 0.68, p = 0.04; Oxford Discovery/Tracking UK r2 = 0.90, p = 3 × 10−4, Tracking UK/PPMI r2 = 0.70, p = 0.03) (Additional file 1: Fig S20). A key feature of PHENIX is the ability to impute missing data and thus potentially predict individual disease progression. Using known clinical progression and baseline clinical symptoms of 80% of patients from each cohort, we calculated the accuracy for predicting the progression measure of a clinical phenotype, given the baseline clinical features, by predicting the progression measures in the 20% of patients excluded and repeated this random exclusion and prediction 1000 times (Additional file 1: Fig S21). Accuracy for predicting the progression of cognitive dysfunction was better (Oxford Discovery MOCA test, r2 = 0.42, Tracking UK r2 = 0.40, PPMI r2 = 0.49), than predicting the progression of motor symptoms (UPDRS III Oxford Discovery: r2 = 0.16, Tracking UK: r2 = 0.29, r2 Discovery = 0.16). We noted that changes in olfactory function, i.e., changes in Sniffin-16 item odour identification scores in the Oxford Discovery cohort, showed the highest predictive accuracy (r2 = 0.72) when estimating deliberately left-out clinical follow-up measures. In accordance with previous studies, this suggests that hyposmic Parkinson’s disease patients exhibit a worse clinical progression as compared to normosmic patients . Instead of a general genotypic relatedness matrix, using one calculated with only Alzheimer’s disease genetic risk loci significantly improved the accuracy for predicting clinical progression (Oxford Discovery: p = 0.017, Tracking UK: p = 0.003, PPMI: p = 0.04, Fig. 4D). This longitudinal phenotypic axis was significantly correlated with Axis 1 reported above that captures baseline clinical presentations (Oxford Discovery (r2 = 0.31, p < 2.2e−16), Tracking UK (r2 = 0.48, p = 4.12e−14), PPMI (r2 = 0.36, p = 4.12e−14)), indicating that Axis 1 is further associated with rapid progression of multiple clinical symptoms (Additional file 1: Fig S22). All these observations together indicate that genetic risk of Alzheimer’s disease could aid as a prognostic marker for Parkinson’s disease presentation and progression.
Understanding how and why the clinical presentations of each patient affected by the same disorder vary is a critical challenge in medicine. We demonstrate a novel approach to quantifying diverse Parkinson’s disease patient phenotypes using a continuous scale to derive phenotype axes. By applying our approach to three independent and deeply phenotyped cohorts, we demonstrate the universality of these axes of phenotypic variation amongst Parkinson’s disease patients. We also show that these axes are robustly derived when reducing the number of clinical features considered and, unlike other dimensionality reduction methods, the genetically-guided PHENIX MPMM approach is the only method tested here that is able to identify the same phenotypic axes underlying Parkinson’s disease patient variation between individuals from all three cohorts (Figs. 2 and 3).
Our approach was able to identify clinical variation that appears relevant to previously-defined categorical Parkinson’s disease subtypes. Anxiety and depression are highly correlated in Parkinson’s disease patients, both of which are correlated with Axes 1 and 2 . Rigidity and bradykinesia are also linked, possibly due to shared physiology , and varied in the same direction along Axis 3. Lawton et al. reported five Parkinson’s disease subgroups, by using the same Oxford Discovery cohort [4, 6] but following a k-means clustering approach . We examined the distribution of phenotypic axis scores across these five Parkinson’s disease subgroups (Additional file 1: Fig S10) and noted that the 5th subgroup of patients, characterised by severe motor, non-motor and cognitive disease, with poor psychological well-being clinical symptoms, were systematically associated with high severity score for all three of our phenotypic axes. Inversely, the first Parkinson’s disease subgroup characterised by mild motor and non-motor disease (group affected by fewer clinical symptoms) displayed a low severity score for our three phenotypic axes. Furthermore, we observed that the individuals of subgroups 4 and 5, characterised by poor psychological well-being, had high severity scores for phenotypic axis 2, the axis most associated with depression and anxiety symptoms. More recently, using both UK cohorts Lawton et al. (2018) reported four Parkinson’s disease subgroups. The subgroups with the best and worst clinical symptoms, clusters 2 and 3 associated with normal/to better and worst clinical symptoms respectively, showed corresponding variation along the phenotypic axes reported here (Additional file 1: Fig S23). These observations demonstrate some consistency between subgroups defined with k-means and our phenotypic axis severity score, but the continuous unimodal distribution of patients along the phenotypic axis does not support the existence of phenotypically distinct clusters of patients.
The phenotypic axes identified were robust in terms of the number of clinical features considered and enable the alignment of patients from different cohorts with different clinical phenotyping structures. The corollary is that PHENIX did not require the variable selection common in Parkinson’s disease clustering approaches, and it can also guide clinicians in determining which clinical assessments are essential to capture Parkinson’s disease heterogeneity. Deep phenotyping is burdensome to both patient and clinician and many of the measures exploited here are compound scores summarising aspects of functioning. Further work identifying the minimally burdensome observations that enable robust scoring of patients along these phenotypic axes would facilitate their utility and adoption across the Parkinson’s disease clinical community, bringing increased power to the discovery of influencing factors. Furthermore, the alignment of phenotypic axes across all Parkinson’s cohorts enabled a multi-cohort GWA meta-analysis of each phenotypic axis. Although no individual loci reached genome-wide significance (Additional file 1: Fig S24-S25), the universality of these axes provides a means to significantly increase power for future meta-analyses (Additional file 1: Fig S24-S25). Finally, we demonstrated here that the MPMM approach can be readily extended to include longitudinal data to determine the phenotypic axes associated with disease progression while simultaneously dealing with missing data, which is a common problem in longitudinal studies.
The phenotypic axes have multiple applications in Parkinson’s disease precision medicine. We found that Parkinson’s disease patients who carry a high genetic risk load for Alzheimer's disease are at higher risk of a more clinically aggressive Parkinson’s disease form including dementia symptoms. Parkinson’s disease patients with a high score for the Phenotypic Axis 1 had significantly lower CSF level Aβ1–42. This fits well with the previous observation that the fastest cognitive decline is with those with low CSF Aβ1-42 at diagnosis [29, 30]. This low level may correspond to PD patients affected in the same time by AD: It was reported ~ 40% of all patients with Lewy body disorders (LBD)  have sufficient amyloid plaque and tau tangle pathology for a concomitant Alzheimer’s disease diagnosis at autopsy and that lower Aβ1-42 levels are predictive of increasing cerebral Alzheimer’s disease and both α-synuclein pathology . While CSF α-synuclein levels might increase as a result of more intense neurodegeneration in PD , we did not observe that Axis 1, the axis most associated with severity and progression, was significantly associated with this CSF biomarker. As obtaining CSF biomarkers is invasive, our phenotypic axis could form part of a less invasive approach to predicting PD patients most at risk for dementia.
Similar to Alzheimer’s disease onset risk but different to Parkinson’s disease onset risk [18, 34], the genetics influencing this phenotypic axis were in genomic regions enriched for microglia-expressed genes (Fig. 4A), suggesting that neuroinflammation plays a key role in the development of a more aggressive form of Parkinson’s disease. One proposition of these findings is that Parkinson’s disease progression could be significantly modified by repurposed Alzheimer’s disease-targeting therapies in some patients.
Our method is potentially applicable to other disorders. However, the collection of cohorts of deeply phenotyped patients for PD is unique amongst neurodegenerative disorders. The wealth of these cohorts for PD is particularly noteworthy and nothing comparable has yet been developed for other dementia disorders such as AD or frontotemporal dementia (FTD), placing PD at the cutting edge of research to identify factors underlying neurodegenerative disease/progression. This approach can be also to higher dimensional datasets such brain imaging or single cell expression dataset. However, our method makes some number of assumptions such as the phenotype being normally distributed and heritable. As a linear mixed model has been fitted to longitudinal data from all patients, the performance to predict progression may be overestimated. However, we noted the performance to predict disease progression was more accurate when using a genotypic relatedness matrix calculated with only Alzheimer’s disease genetic risk loci instead of the overall genotypic relatedness. Finally, the application of this approach requires technical expertise and the manipulation of the large genetic dataset necessitating high-performance computation. However, it is certainly feasible to implement an accessible and secure platform whereby available clinically-measured phenotypes and the relevant genotypic information of Parkinson’s patients could be entered and the phenotypic axes values returned, along with relevant longitudinal predictions to support clinical advice/intervention.
The universal axes identified have the potential to accelerate our understanding of how Parkinson’s disease presents in individual patients, providing robust and objective quantitative traits through which patients may be appropriately compared and underlying disease-modifying mechanisms understood. This will lead ultimately to appropriately targeted therapeutic strategies delivered on an individualised basis. We believe the applications of this approach extend far beyond Parkinson’s disease.
Availability of data and materials
As the clinical genetic cohorts Oxford Discovery (n = 842) [4, 6] and Tracking UK (n = 1807)  cohorts contain potentially identifying and sensitive patient information, they cannot be publicly shared but are available upon request (https://github.com/csandorfr/Phenotypic-Axes). To access to the clinical data of the Oxford Discovery cohort, researchers must to complete the following form https://www.dpag.ox.ac.uk/files/research/opdc-biosample-and-clinical-data-application-form send it to Prof Richard Wade-Martins (email firstname.lastname@example.org) and Prof Michele Hu (email email@example.com). To access to the clinical data of the Tracking UK cohort, researchers must contact Dr Donal Grosset (firstname.lastname@example.org). PPMI data are available to the research community on the PPMI website: www.ppmi-info.org PHENIX code used here is available at the following link: https://mathgen.stats.ox.ac.uk/genetics_software/phenix/phenix.html. An example illustrating how to derive phenotypic axes with PHENIX can be found here https://github.com/csandorfr/Workshop-PhenoAxis-Lux-Feb2019. The code to derive the measures of disease progression and severity lmm can be found here: https://github.com/aschalkamp/PhenotypicAxes_DiseaseProgression. The code associated with the different figures of this manuscript can be found here: https://github.com/csandorfr/Phenotypic-Axes. The cell association analyses have been performed with the same approach and dataset described by Agarwal et al. (2020) : The processed 10 × 3′ Chromium single-nuclei RNAseq UMI-barcode matrices for each sample are available from the Gene Expression Omnibus under the accession code GSE140231  (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140231). An R Markdown document, including a version with R code to generate these gene sets and perform cell-type association, can be found here: https://github.com/csandorfr/SN_Atlas.
Body mass index
Genome-wide association study
Independent component analysis
Multiple phenotype mixed model
Medical Research Council
National Institute for Health Research
Oxford Parkinson’s Disease Centre
Principle component analyses
PHENotype Imputation eXpediated
Parkinson’s Progression Markers Initiative
Foltynie T, Brayne C, Barker RA. The heterogeneity of idiopathic Parkinson’s disease. J Neurol. 2002;249:138–45.
Malek N, Swallow DM, Grosset KA, Lawton MA, Marrinan SL, Lehn AC, Bresner C, Bajaj N, Barker RA, Ben-Shlomo Y, et al. Tracking Parkinson’s: Study Design and Baseline Patient Data. J Parkinsons Dis. 2015;5:947–59.
Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, Coffey C, Kieburtz K, Flagg E, Chowdhury S, et al. The Parkinson Progression Marker Initiative (PPMI). Progress in Neurobiology 2011;95:629-35.
Szewczyk-Krolikowski K, Tomlinson P, Nithi K, Wade-Martins R, Talbot K, Ben-Shlomo Y, Hu MT. The influence of age and gender on motor and non-motor features of early Parkinson’s disease: initial findings from the Oxford Parkinson Disease Center (OPDC) discovery cohort. Parkinsonism Relat Disord. 2014;20:99–105.
Fereshtehnejad SM, Zeighami Y, Dagher A, Postuma RB. Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudinal progression. Brain. 2017;140:1959–76.
Lawton M, Baig F, Rolinski M, Ruffman C, Nithi K, May MT, Ben-Shlomo Y, Hu MT. Parkinson’s Disease Subtypes in the Oxford Parkinson Disease Centre (OPDC) discovery cohort. J Parkinsons Dis. 2015;5:269–79.
Lawton M, Ben-Shlomo Y, May MT, Baig F, Barber TR, Klein JC, Swallow DMA, Malek N, Grosset KA, Bajaj N, et al. Developing and validating Parkinson’s disease subtypes and their motor and cognitive progression. J Neurol Neurosurg Psychiatry. 2018;89(12):1279–87.
Erro R, Picillo M, Vitale C, Palladino R, Amboni M, Moccia M, Pellecchia MT, Barone P. Clinical clusters and dopaminergic dysfunction in de-novo Parkinson disease. Parkinsonism Relat Disord. 2016;28:137–40.
Dahl A, Iotchkova V, Baud A, Johansson A, Gyllensten U, Soranzo N, Mott R, Kranis A, Marchini J. A multiple-phenotype imputation method for genetic studies. Nat Genet. 2016;48:466–72.
Parkinson Progression Marker I. The Parkinson Progression Marker Initiative (PPMI). Prog Neurobiol. 2011;95:629–35.
Nalls MA, Bras J, Hernandez DG, Keller MF, Majounie E, Renton AE, Saad M, Jansen I, Guerreiro R, Lubbe S, et al. NeuroX, a fast and efficient genotyping platform for investigation of neurodegenerative diseases. Neurobiol Aging. 2015;36(1605):e1607-1612.
Nalls MA, Keller MF, Hernandez DG, Chen L, Stone DJ, Singleton AB, Parkinson’s Progression Marker Initiative investigators. Baseline genetic associations in the Parkinson’s Progression Markers Initiative (PPMI). Mov Disord. 2016;31:79–85.
Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–74.
Holden SK, Finseth T, Sillau SH, Berman BD. Progression of MDS-UPDRS Scores Over Five Years in De Novo Parkinson Disease from the Parkinson’s Progression Markers Initiative Cohort. Mov Disord Clin Pract. 2018;5:47–53.
Vu TC, Nutt JG, Holford NH. Progression of motor and nonmotor features of Parkinson’s disease and their response to treatment. Br J Clin Pharmacol. 2012;74:267–83.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Zhu Z, Zheng Z, Zhang F, Wu Y, Trzaskowski M, Maier R, Robinson MR, McGrath JJ, Visscher PM, Wray NR, Yang J. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun. 2018;9:224.
Agarwal D, Sandor C, Volpato V, Caffrey TM, Monzon-Sandoval J, Bowden R, Alegre-Abarrategui J, Wade-Martins R, Webber C. A single-cell atlas of the human substantia nigra reveals cell-specific pathways associated with neurological disorders. Nat Commun. 2020;11:4183.
de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11:e1004219.
Csardi G, Nepusz T. The igraph software package for complex network research. InterJ Complex Syst. 2006;1695:1–9 http://igraph.sf.net.
Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006;22:1600-07.
Sayols S. Rrvgo: a Bioconductor package to reduce and visualize Gene Ontology terms. https://www.ssayolsgithubio/rrvgo 2020.
Graham DB, Xavier RJ. Pathway paradigms revealed from the genetics of inflammatory bowel disease. Nature. 2020;578:527–39.
Irwin DJ, Xie SX, Coughlin D, Nevler N, Akhtar RS, McMillan CT, Lee EB, Wolk DA, Weintraub D, Chen-Plotkin A, et al. CSF tau and beta-amyloid predict cerebral synucleinopathy in autopsied Lewy body disorders. Neurology. 2018;90:e1038–46.
Lehmann S, Dumurgier J, Ayrignac X, Marelli C, Alcolea D, Ormaechea JF, Thouvenot E, Delaby C, Hirtz C, Vialaret J, et al. Cerebrospinal fluid A beta 1–40 peptides increase in Alzheimer’s disease and are highly correlated with phospho-tau in control individuals. Alzheimers Res Ther. 2020;12:123.
He R, Zhao Y, He Y, Zhou Y, Yang J, Zhou X, Zhu L, Zhou X, Liu Z, Xu Q, et al. Olfactory Dysfunction Predicts Disease Progression in Parkinson’s Disease: A Longitudinal Study. Front Neurosci. 2020;14:569777.
Menza MA, Robertson-Hoffman DE, Bonapace AS. Parkinson’s disease and anxiety: comorbidity with depression. Biol Psychiatry. 1993;34:465–70.
Berardelli A, Rothwell JC, Thompson PD, Hallett M. Pathophysiology of bradykinesia in Parkinson’s disease. Brain. 2001;124:2131–46.
Delgado-Alvarado M, Gago B, Navalpotro-Gomez I, Jimenez-Urbieta H, Rodriguez-Oroz MC. Biomarkers for dementia and mild cognitive impairment in Parkinson’s disease. Mov Disord. 2016;31:861–81.
Shahid M, Kim J, Leaver K, Hendershott T, Zhu D, Cholerton B, Henderson VW, Tian L, Poston KL. An increased rate of longitudinal cognitive decline is observed in Parkinson’s disease patients with low CSF Ass42 and an APOE epsilon4 allele. Neurobiol Dis. 2019;127:278–86.
Compta Y, Parkkinen L, O’Sullivan SS, Vandrovcova J, Holton JL, Collins C, Lashley T, Kallis C, Williams DR, de Silva R, et al. Lewy- and Alzheimer-type pathologies in Parkinson’s disease dementia: which is more important? Brain. 2011;134:1493–505.
Brainstorm C, Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, Escott-Price V, Falcone GJ, Gormley P, et al. Analysis of shared heritability in common disorders of the brain. Science. 2018;360(6395):eaap8757.
Hall S, Surova Y, Ohrfelt A, Swedish Bio FS, Blennow K, Zetterberg H, Hansson O. Longitudinal measurements of cerebrospinal fluid biomarkers in Parkinson’s disease. Mov Disord. 2016;31:898–905.
Bryois J, Skene NG, Hansen TF, Kogelman LJA, Watson HJ, Liu Z, Eating Disorders Working Group of the Psychiatric Genomics Consortium, International Headache Genetics Consortium, Me Research Team, Brueggeman L, et al. Genetic identification of cell types underlying brain complex traits yields insights into the etiology of Parkinson’s disease. Nat Genet. 2020;52:482–93.
Marek K, Chowdhury S, Siderowf A, Lasch S, Coffey CS, Caspell-Garcia C, Simuni T, Jennings D, Tanner CM, Trojanowski JQ, et al. The Parkinson’s progression markers initiative (PPMI) - establishing a PD biomarker cohort. Ann Clin Transl Neurol. 2018;5:1460–77.
We thank the Oxford Genomics Centre at the Wellcome Centre for Human Genetics, Oxford) for the generation genotyping data.
This work was supported by the Monument Trust Discovery Award from Parkinson’s UK, the Oxford Genomics Centre at the Wellcome Centre for Human Genetics, the Wellcome Trust (grant reference 090532/Z/09/Z) and an MRC Hub grant G0900747 91070). Samples and associated clinical data were supplied by the Oxford Parkinson's Disease Centre (OPDC) study, funded by the Monument Trust Discovery Award from Parkinson’s UK, with the support of the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). CW’s lab is supported by the UK Dementia Research Institute, which receives its funding from DRI Ltd, funded by the UK Medical Research Council, Alzheimer’s Society and Alzheimer’s Research. CW and CS are supported by Computational Science Program funded by Michael J. Fox Foundation. CW, NR and CS are supported by the UK Dementia Research Institute (UK DRI) funded by the Medical Research Council (MRC), Alzheimer’s Society and Alzheimer’s Research UK (AR-UK). CS and NR are supported by the Ser Cymru II programme which is part-funded by Cardiff University and the European Regional Development Fund through the Welsh Government. AS is supported by a PhD studentship funded by Heath and Cares Research Wales. JM acknowledges funding for this work from the European Research Council (ERC; grant 617306). We thank the Oxford Genomics Centre at the Wellcome Centre for Human Genetics, Oxford) for the generation of genotyping data.
Ethics approval and consent to participate
Oxford Discovery cohort: The clinical data were supplied by the OPDC study. This study was undertaken with the understanding and written consent of each subject, with the approval of the local NHS ethics committee (National Research Ethics Service (NRES) Committee South Central – Oxford A (Ref:16/SC/0108) and Oxford C (Ref:15/SC/0117); Berkshire Research Ethics Committee (Ref: 10/H0505/71)), and in compliance with national legislation and the Declaration of Helsinki.
Tracking UK: The study is carried out in accordance with the Declaration of Helsinki and has been described in detail by Malek et al. .
PPMI: As described by Marek et al.  This study was conducted in accordance with the Declaration of Helsinki and the Good Clinical Practice guidelines after approval of the local ethics committees of the participating sites.
Consent for publication
J.M. owns stocks and stock options in Regeneron Pharmaceuticals. The remaining authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sandor, C., Millin, S., Dahl, A. et al. Universal clinical Parkinson’s disease axes identify a major influence of neuroinflammation. Genome Med 14, 129 (2022). https://doi.org/10.1186/s13073-022-01132-9
- Parkinson’s disease
- Deeply phenotyped cohorts