Look, no hands! Spectral biomarkers from genetic association studies

Recent advances in our understanding of the genomics of the human metabolome have shed light on the pathways involved in metabolic and cardiovascular disease. Such studies crucially depend on the interpretation of complex molecular spectra. A recent study by Suhre and colleagues provides a way to identify potentially clinically relevant biomarkers without a priori information, such as reference spectra, thus aiding the discovery of additional spectral features and corresponding genomic loci associated with metabolism and disease.

Before the widespread application of genomewide asso ciation studies (GWASs) in the mid2000s, techniques such as linkage analysis of families and candidategene studies had largely failed to identify robust and replicable loci associated with diseases that are common in the population. As judged by the criteria of replication, GWASs have been among the most successful epidemio logical study designs to date, in no small measure due to large sample sizes, stringent quality control, simplicity of experimental design, and collaborative transparency between researchers. Yet identifying common disease loci, even when they explain a large proportion of herita bility, only goes so far in advancing our understanding of pathogenesis.
Many GWASs employ a casecontrol design, where a set of individuals carrying the disease is compared with a set of nondiseased or populationbased individuals. Th is is a useful strategy for fi nding loci associated with disease; however, categorizing patients into two classes (such as disease/nodisease) ignores the biological intricacies of the disease at hand, and provides only a rough guide to the underlying etiology. To create more detailed and accurate models of pathogenesis, it is important to look in more detail at the potential intermediate phenotypes, for example, by measuring concentrations of cellular products and enzymes that underlie the processes of disease. Th e ready availability of the relevant tissues and accurate, highthroughput tech nology have allowed researchers to leverage metabolomic profi ling to elucidate the genomics of one such class of intermediate phenotypes, namely metabolites, which play an important role in metabolic and cardiovascular diseases [1]. Recent GWASs of the metabolome have identifi ed scores of loci associated with metabolites [25], some of which (both loci and metabolites) have been shown to be associated with disease. Furthermore, given known pathway relationships between metabolites and the high dimensionality of the phenotype data, researchers have begun using novel approaches such as phenotype ratios and multivariate analysis of phenotype networks [6] to increase statistical power and interpretation.
In this issue of Genome Medicine, Suhre and colleagues [7] sidestep a fundamental challenge in previous GWASs, the decomposition of nuclear magnetic reso nance (NMR) spectra into known metabolite concentra tions, to expand the power of spectral association studies. In doing so, they present a novel method for identifying previously uncharacterized spectral features that may prove to be important biomarkers of disease.

Unbiased assessment of NMR spectra
A large contributing factor to the success of GWASs has been that they are relatively unbiased, in the sense that they assess marker variables that are roughly evenly drawn from across the genome rather than focusing only on specifi c loci or variants of interest. Th is lack of bias has enabled detection of previously unknown signals that would not have been found by methods such as candidategene studies. Analogously, the new study [7] shows the benefi t of considering phenotypes in an

Abstract
Recent advances in our understanding of the genomics of the human metabolome have shed light on the pathways involved in metabolic and cardiovascular disease. Such studies crucially depend on the interpretation of complex molecular spectra. A recent study by Suhre and colleagues provides a way to identify potentially clinically relevant biomarkers without a priori information, such as reference spectra, thus aiding the discovery of additional spectral features and corresponding genomic loci associated with metabolism and disease.  unbiased way as well. Instead of searching for previously characterized metabolites in the NMR spectra and testing for association of these metabolites with geno type, the authors examined all available signals in the molecular spectra and associated each one with geno types in a GWASstyle approach [7]. Similar to unbiased GWASs, the main premise of the unbiased NMR search is that by expanding testing beyond previously known metabolites, some novel classes and associations may be discovered and characterized.

R E S E A R C H H I G H L I G H T
To this end, Suhre and colleagues [7] used NMR measure ments of plasma samples from more than 1,700 individuals in the KORA study [8]. A workflow of their study is presented in Figure 1. The same individuals were also genotyped using a genomewide array, covering more than 600,000 genomewide single nucleotide polymorphisms (SNPs). They binned the NMR spectra into 10,000 bins (spectral features), where each bin represents a potentially different metabolite. Binning is a simple procedure where an NMR spectra is split into windows of equal width (in parts per million (ppm)) and the signal intensity in a bin represents a quantification of the molecule(s) in that window for that sample. Often ratios of metabolites are more biologically informative than metabolite concentrations themselves, as these ratios better reflect enzymatic reactions, in which one metabolite is converted into another at a certain rate. However, exploring all unique pairs of bins for association with each SNP is computationally difficult. Therefore, the authors [7] took a twostage approach: first all spectral features were examined for association with the SNPs, and then the top 500 spectral features were used to compute pairwise ratios, yielding a total of 133,350 pheno types. The association between the genotypes and the NMRbased phenotypes (either spectral features or ratios thereof ) was tested using a linear model adjusted for age and gender, followed by Bonferroni adjustment for multiple testing.
Using this approach, seven loci achieved genomewide significance: LIPC, CETP, FADS1, GCKR, APOA1, CPS1, and PYROXD2. Of these, five are wellknown loci that also had been previously reported using a targeted approach on the same data (examining 15 known lipoprotein subclasses). The use of ratios of NMR shifts Figure 1. Flowchart of the spectral GWAS [7]. For each individual, genome-wide SNP data and blood plasma samples were available. Each blood plasma sample was then assayed with two different metabolomics platforms (mass spectrometry and proton NMR spectroscopy). The chemical shifts in the NMR spectra were then analyzed using a sliding window to create bins that quantified the amount of each molecule(s) that contributed to that bin in each sample. Traditionally, metabolite concentrations are extracted from NMR spectra using known profiles, but the use of bins allowed the authors [7] to take a hypothesis-free data mining approach. The authors then performed a two-stage GWAS, first identifying the 500 bins with the strongest genetic signals, determining the ratios between each pair of them, and then adding all unique ratios of the top bins in a second GWAS. The phenotype associations of the detected loci could then be interpreted using the mass spectrometry metabolomics data from the same blood plasma samples. rather than the individual shifts themselves resulted in lower phenotypic variance and substantially lower P values for four of these loci than were achievable using the previously reported lipid subclasses.
As further validation of the NMR spectra, the authors [7] compared the results from NMR with those obtained from mass spectroscopy, showing that NMR spectra for the detected loci generally correlated with concentrations for the same metabolites determined by mass spectro metry. Although the possible applications of these methods are exciting, one future challenge for phenotypically and genotypically unbiased studies will be the interpretation of the associations detected, as correlation with a known variable is confounded by other crosscorrelations.

The future of metabolic trait associations
This study has highlighted two concepts that may prove useful in further genetic association studies of many phenotypes: largescale unbiased screening of pheno types and trying to account for interphenotype relation ships (such as ratios). There is potential to expand the types of relationships modeled, for example, using pheno type correlation networks [6,9] that capture potential pleio tropy of loci affecting a group of correlated metabo lites. More generally, this work is part of a trend towards a systemslevel analysis of disease, based on multivariate data analysis of multiple complementary datasets such as gene expression, metabolites, and genetic variation data [10], leading not just to detection of genotypephenotype associations as in standard GWASs but ultimately to better mechanistic understanding of the pathways and molecular networks involved in the architecture of human traits and disease.

Competing interests
The authors declare that they have no competing interests.