Systems medicine and the integration of bioinformatic tools for the diagnosis of Alzheimer's disease

Because of the changes in demographic structure, the prevalence of Alzheimer's disease is expected to rise dramatically over the next decades. The progression of this degenerative and terminal disease is gradual, with the subclinical stage of illness believed to span several decades. Despite this, no therapy to prevent or cure Alzheimer's disease is currently available. Early disease detection is still important for delaying the onset of the disease with pharmacological treatment and/or lifestyle changes, assessing the efficacy of potential therapeutic agents, or monitoring disease progression more closely using medical imaging. Sensitive cerebrospinal-fluid-derived marker candidates exist, but given the invasiveness of sample collection their use in routine diagnostics may be limited. The pathogenesis of Alzheimer's disease is complex and poorly understood. There is thus a strong case for integrating information across multiple physiological levels, from molecular profiling (metabolomics, lipidomics, proteomics and transcriptomics) and brain imaging to cognitive assessments. To facilitate the integration of heterogeneous data, such as molecular and image data, sophisticated statistical approaches are needed to segment the image data and study their dependencies on molecular changes in the same individuals. Molecular profiling, combined with biophysical modeling of molecular assemblies associated with the disease, offer an opportunity to link the molecular pathway changes with cell- and tissue-level physiology and structure. Given that data acquired at different levels can carry complementary information about early Alzheimer's disease pathology, it is expected that their integration will improve early detection as well as our understanding of the disease.


Towards molecular markers of AD
AD is characterized by deposition of amyloid β (Aβ) in the extracellular space. Given that the allele ε4 of the apolipoprotein E gene (APOE4), the major genetic risk factor of AD [9], leads to excess Ab accumulation before the first symptoms of AD [10], it was believed that Aβ also has a pathogenic role [11]. However, it was later shown that Aβ accumulation in plaques is insufficient to cause the neuronal cell death observed in AD, and that neuronal protein tau is essential for neurodegeneration in AD [12,13].
The 40-or 42-peptide amyloid β (Aβ 1-40/42 ), total tau and tau phosphorylated at Thr181 (P-tau 181P ), all of which can be measured from cerebrospinal fluid (CSF), are well established markers of AD [14]. A recent study [15] used an unsupervised mixture modeling approach, independent of AD diagnosis, to identify a molecular signature derived from a mixture of Aβ 1-42 and P-tau 181P that was associated with AD. The AD signature identified subjects who progress from MCI to AD with high sensitivity and was surprisingly also present in a third of cognitively normal subjects, suggesting that AD pathology may occur earlier than previously thought.
CSF has severe drawbacks for routine diagnosis because of the invasiveness and potential side effects of sample collection. However, attempts to use Aβ or tau as measured from plasma as potential predictive markers of AD have so far not been successful [16][17][18]. Among the available non-invasive techniques, brain imaging methods, such as magnetic resonance imaging or positron emission tomography, can identify cerebral pathologies specifically associated with early progression to AD [18,19]. At present, it is unclear how atrophy in the hippocampus and hypometabolism in the inferior parietal lobules, as observed in these studies, relate to the disease pathophysiology and the existing CSF-derived markers.

High-throughput strategies to identify novel blood-based biomarkers
The 'omics' revolution has given us the tools needed for a discovery-driven strategy to identify new molecular biomarkers from biofluids, cells or tissues. Lessons have been learned about the statistical and study design precautions needed when applying such strategies of measuring large numbers of molecular components [20,21]. The major advantage of high-throughput approaches over more targeted hypothesis-driven strategies is their capacity to collect large amounts of information about a specific phenotype or disease condition in an unbiased manner.
Recent quantitative analysis of 120 plasma proteins [22] identified 18 signaling proteins as potential predictive biomarker candidates, which were mainly associated with reduced hematopoiesis and inflammation during presymptomatic AD. In a subsequent larger serum proteomics study by another research team [23], a multiplex protein immunoassay was used to classify AD and controls with high sensitivity and specificity. Notably, the overlap of the marker proteins between the two studies was minimal, and neither of the studies [22,23] were validated in an independent cohort. Blood mononuclear cells have also been considered as a potential source of biomarkers. Preliminary studies using transcriptional and microRNA profiling in AD patients and healthy controls suggest that a distinct ADassociated expression signature can be identified [24,25]. The major changes in blood mononuclear cells include diminished expression of genes involved in cytoskeletal maintenance, DNA repair and redox homeostasis.
Profiling of small molecules (metabolites) is also a promising way to search for new AD biomarkers. Concentration changes of specific groups of circulating metabolites may be sensitive to pathogenically relevant factors, such as genetic variation, diet, age or gut microbiota [26][27][28][29]. The study of high-dimensional chemical signatures as obtained by metabolomics may therefore be a powerful tool for characterization of complex phenotypes affected by both genetic and environmental factors [30]. No metabolic markers have been reported so far for AD, but several projects aiming to discover serum-derived metabolic markers are ongoing, including HUSERMET [31] and PredictAD [32].

Towards systems medicine in AD
Large amounts of information gathered by various highthroughput technologies come at a price. The data, usually corresponding to different aspects of disease pathology, need to be integrated in a meaningful way. Such data integration does not encompass only informatics and statistics; for example, it includes the development of tools not only for storing and mining the data, but also modeling of the data in the context of disease pathophysiology. In AD, the adoption of a systems approach is particularly challenging since even at the molecular level the disease pathogenesis is highly complex, covering multiple spatial and temporal scales. As discussed below, this complexity demands that studies look beyond the pathways.
The genetics of late-onset AD is complex, although several of the common risk alleles other than APOE are involved in production, aggregation and removal of Aβ [33]. Several of the associated single nucleotide polymorphisms produce a synonymous codon change; that is, without any change in the corresponding protein sequence [33,34]. Such synonymous codon changes may not affect gene expression but can affect protein folding and thus the structure and function of the protein [35] by affecting translational accuracy or co-translational folding and thus formation and stabilization of protein secondary structure [36].
The importance of understanding the structural and spatial context of AD-associated proteins and peptides is underlined by recent studies of truncated Aβ fragments (Aβ 17-40/42 [37] and Aβ 11-40/42 [38]), which are nonamyloidogenic and thus were believed to be harmless bystanders in amyloid plaques found in AD. Molecular dynamics simulations of truncated Aβ peptides, followed up by functional studies, suggest that these peptides are mobile in biological membranes and may dynamically form ion channels [39]. Such ion channels may be toxic, as they affect the uptake of ions such as calcium into the cells. The reason that they can appear with aging, in some individuals, remains to be established. One possible explanation is the varying composition of neuronal lipid membranes, specifically plasmalogens, ether phospholipids that are enriched in polyunsaturated fatty acids and are abundant in brain [40,41]. Plasmalogens affect membrane fluidity and protein mobility [40,42] and they are found to be diminished in early AD [43][44][45] and in normal aging [46]. In addition, plasmalogens, via their vinyl-ether bond, act as endogenous antioxidants to protect cells from reactive oxygen species, and their reduction in AD is thus in line with the hypothesis implicating the role of oxidative stress in AD pathogenesis [47]. Taking these results together, one would expect that age-related and disease-related changes in membrane lipid composition would also affect the mobility of Aβ peptides, including dynamics of their self-assembly.
Lipidomics tools are now available for detailed studies of molecular lipids in cells and biofluids [48]. Molecular profiling, combined with biophysical modeling of membrane systems -for example, to study β-sheet self assembly [49,50], lipid membranes [51] or lipoproteins [52] -thus offer an opportunity to link the molecular pathway changes with cell-and tissue-level physiology and structure. This may not only lead to new concepts in disease pathogenesis, but also suggest new diagnostic and therapeutic avenues.

Bioinformatics tools enabling a systems medicine approach to AD
Many tools are available for mining of heterogeneous biological data, although the focus of such tools and the challenges being addressed by them have largely been in the domains of molecular interactions and biological pathways [53]. There is still a gap between the molecular representations of disease-related processes and the clinical disease. In this context, the measurement of traits that are modulated but not encoded by the DNA sequence, commonly referred to as intermediate phenotypes [54], may be of particular interest. These intermediate phenotypes not only include biochemical, genomic or functional traits, as discussed above, but also an individual's microbial (gut microflora) and social traits. The bioinformatic strategies to manage the disease-associated genetic, molecular and phenotypic data would thus aim to link the biological networks with specific intermediate phenotypes relevant to clinical disease by using a suite of models ( Figure 1). The models,  which could be, for example, biophysical or statistical, as described above, together with the intermediate phenotype data, could be used for discovery of new biomarkers of pathophysiological relevance.
Intermediate phenotypes, such as brain image data or serum metabolomic profiles, may also facilitate linking of the findings from experimental disease models with clinical phenotypes. This is particularly relevant for diseases in which animal models are difficult to validate, such as in diseases of the central nervous system. One recent example is a metabolomic study of Huntington's disease [55], for which early disease markers were sought in patients and a transgenic mouse model. Clear differences in metabolic profiles between transgenic mice and wild-type littermates were observed, with a trend for similar differences between human patients and control subjects. The data thus raise the prospect of a robust molecular definition of progression of Huntington's disease before symptom onset and, if validated in a genuinely prospective manner, these biomarker trajectories could facilitate the development of useful therapies for this disease. A similar strategy could also be useful in the studies involving transgenic mouse models of AD [56].

Conclusions
The pathogenesis of AD is complex and there is a strong case for integrating information across multiple physiological levels, from molecular profiling (metabolomics, lipidomics, proteomics and transcriptomics) and brain imaging to cognitive assessments. The adoption of a systems approach to study AD will demand integration of heterogeneous data (such as molecular and image data) and studies of disease-associated molecules and their assemblies beyond the pathway-centric view. To address data integration, sophisticated approaches are needed to segment the image data [57] and study their dependencies on molecular changes in the same subjects. To take studies beyond pathways, computational models are needed to study AD-associated molecules and their interactions in the spatial and temporal context. Given that data acquired at different levels may carry complementary information about early AD pathology, it is expected that their integration will improve early detection as well as our understanding of the disease.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
MO conceived and wrote the manuscript. JL and HS critically reviewed the manuscript and contributed to its writing.

Author information
MO is a Research Professor of systems biology and bioinformatics. His main research areas are metabolomic applications in biomedical research and integrative bioinformatics. He coordinates the European project ETHERPATHS [58], which aims to understand how diet modulates lipid homeostasis, specifically ether lipid metabolism. JL is senior research scientist in data mining. His main research interests are in medical image analysis and decision support systems. He is currently coordinating the European project PredictAD [32] aiming to find efficient biomarkers and their combinations for allowing objective and efficient diagnostics in AD. HS is a Professor of neurology. Her main research field is Alzheimer's disease, specifically genetic and life style risk factors, biomarkers and magnetic resonance imaging. She is a partner in EU projects PredictAD and LIPIDIDIET.