Quantitative high-throughput metabolomics: a new era in epidemiology and genetics

Metabolites in body fluids reflect multiple biochemical processes and pathways relevant to health and disease. Comprehensive approaches to gain insights into metabolic variation and diseases, such as metabolic phenotyping, have become increasingly popular over recent years [1-3]. These developments have been driven by mass spectrometry (MS) and proton nuclear magnetic resonance (NMR) spectroscopy as the two key experimental technologies. On the basis of findings from multiple disciplines, it has been envisaged that metabolic phenotyping will eventually lead to holistic risk assessment for various diseases [4].

Metabolites in body fluids reflect multiple biochemical processes and pathways relevant to health and disease. Comprehensive approaches to gain insights into meta bolic variation and diseases, such as metabolic pheno typing, have become increasingly popular over recent years [13]. These developments have been driven by mass spectrometry (MS) and proton nuclear magnetic resonance (NMR) spectroscopy as the two key experi mental technologies. On the basis of findings from multiple disciplines, it has been envisaged that metabolic phenotyping will eventually lead to holistic risk assessment for various diseases [4].

Inevitability of the health-disease continuum
In the field of metabolomics, the complexity of the data generated has directed analyses towards spectral chemo metrics and blackandwhite thinking, the main goal often being to separate individuals who are 'healthy' from those who are 'diseased' . These kinds of simple classifi cation approaches, often also using spectroscopic measures that are nonspecific in molecular terms, are not optimal for deriving metabolic understanding in epidemiology and genomic medicine. They also poorly reflect the continuity of biological processes and states ( Figure 1). Even though single biomarkers and diagnostic thresholds are necessary criteria in current clinical practice, it will be pivotal that future medicine builds on our growing understanding of disease etiology. Import antly, we should inherently accept the biologically inevitable metabolic and disease continuum instead of being hampered by the apparently unattainable black andwhite diagnostics. Common disorders, being multi genic, are essentially quantitative traits [5], and ultimately we cannot hide from this fundamental feature of nature.

Quantification rules
Techniques that can quantify large numbers of metabo lites, representing multiple metabolic pathways in sys temic metabolism, are particularly relevant for the risk assessment of metabolic disorders, such as diabetes and vascular diseases. Currently, NMRbased applications are the only approaches that can offer fully automated and highly reproducible highthroughput experimentation in a very costeffective manner [6,7]. Although the variety of molecules measurable by MS cannot be matched by NMR, the persample costs using MS still tend to be high, and the quantitative throughput is limited. Recent MS applications, however, show appealing progress in both throughput and molecular identifications [3], suggesting that, in the near future, MS and NMR will most likely be used as complementary technologies in largescale epidemiology. Combined with genomewide and gene expression data at the population level, comprehensive metabolic information has started to trigger detailed systemslevel findings [8,9]. This novel line of multiomics research is anticipated to grow rapidly and to allow a more thorough molecular understanding of biochemical pathways and disease pathologies. Yet metabolomics will be truly useful in epidemiology or in genetic studies only if quantitative data on specific, identified metabolites are available.

Specificity is power
Broadly speaking, epidemiological research and genome wide association studies (GWASs) aim to discover associations in order to generate biological hypotheses. Conventional thinking holds that the number of people is the primary adjustable variable for increasing the statistical power to detect variants of a given effect in a GWAS. However, effect sizes depend on phenotypic definitions and will get stronger as one moves closer to the biochemical source. This implies that 'missing heritability' is not simply a reflection of what cannot be found by a common genetic variant association; it relates fundamentally to the biological and molecular rationale of the trait. We have recently demonstrated this in a GWAS on (only) 8,330 Finnish individuals for a range of 216 serum metabolic measures quantified by our NMR Figure 1. Quantitative metabolic phenotyping for continuous pathway modeling versus spectral-based black-and-white diagnostic classification. Right: A common metabolomics approach is to use the spectral data directly in a chemometrics model that explores the overall differences between individuals expected to belong to different diagnostic groups. This approach is not optimal for epidemiological or genetic research in which metabolic and disease continuum should be appreciated, because common disorders, being multigenic, are fundamentally quantitative traits. Therefore, the real data (strongly overlapping metabolic characteristics) do not match with the pre-defined groups (health versus disease). Left: New high-throughput methodologies involve sophisticated automation, including absolute quantification of identified metabolites. This provides new opportunities to understand disease etiologies and to handle disease risks and diagnostics as truly continuous multivariate phenomena. When these kinds of approaches mature and extensive datasets accumulate, it is anticipated that characteristic metabolic phenotypes for various disease-related pathways can be identified. This would allow overall assessment of individual health status and disease risks. Here, from the spectral NMR data of an individual, metabolites are identified and quantified in a fully automated manner, resulting in a comprehensive metabolic phenotype. Different pathways, which predispose to metabolic disorders in a distinctive way, have characteristic (time-dependent) metabolic signatures with specific risk distributions. In real life, metabolic disorders are interrelated and rarely exist in isolation. metabolomics platform [7]. We identified 31 loci associated with one or more metabolic measures at a genomewide significance level, including seven newly identified loci for lowmolecularweight metabolites and four for serum lipoprotein and lipid measures. Using the same Finnish individuals, we also performed an asso ciation analysis of the 95 genetic loci known to affect serum cholesterol and triglyceride levels [10]. Our analysis included comprehensive data on lipids and lipoprotein subclasses, obtained via the NMR metabolomics plat form, and four enzymatic lipid traits. For 30 of the 95 loci, we identified new metabolic or genetic associations.

Different phenotypic risks
In the majority of the loci, the strongest association was to a more specific metabolite measure than the total lipids measured enzymatically. Interestingly, in four loci, the smallest highdensity lipoprotein (HDL) measures showed effects opposite to the larger ones, a finding that indicates distinct metabolic characteristics for small and large HDL particles as also previously indicated by the gene coexpression findings in circulating leukocytes [8].
Thus, the findings feature considerable diversity in association patterns for the loci originally identified through associations with enzymatic total lipid measures, and they reveal association profiles of far greater resolution than those from routine clinical lipid measures [10]. Therefore, not unexpectedly, metabolic measures of pathway specificity (such as HDL subclasses) can provide far better insights into biological processes than common clinical measures representing merely a sum of multiple biochemical components (such as HDL cholesterol).

One-for-all goes multiple
Increased etiological understanding and good biomarkers for disease prediction and prevention could facilitate clinical progress and translational medicine. However, in typical epidemiological studies, the findings, although statistically significant at a population level, reflect only weak relationships between metabolism and demo graphic or clinical measures, and therefore do not provide a sound basis for individual prediction models. Metabolic measures close to the underlying molecular pathways are needed to increase the accuracy of such modeling. Meta bolomics approaches can intrinsically provide holistic molecular perspectives and thereby lead to better repre sen tations of disease progression, especially if we stop using univariate cutoffs for diagnostics and start hand ling disease risk as multivariate continuous dimensions. Combined with new bioinformatics schemes that include biological justification, comprehensive metabolic pheno typing has the potential to provide globally useful solicitous societal solutions with better individual well being together with more efficiently spent health budgets. It is clearly understood that typical clinical outcomes of metabolic diseases, like infarctions in coronary heart disease, occur as a result of lifelong effects of multiple molecular pathways. If characteristic combinations of these pathways exist, and if they are distinct in relation to metabolic disorders, a pathwayspecific identification and risk assessment of individuals might be possible at the population level ( Figure 1). The potential translation from the current 'one model for all situations' to a new clinical practice incorporating pathways and metabolic phenotypes requires better understanding of the etiologies of metabolic diseases and their interplay. This invokes the need for quantitative metabolomics for the masses.