From association to causality: the new frontier for complex traits

Technological and analytical advances have led to an unprecedented catalog of genomic regions associated with a broad range of clinically relevant phenotypes in humans. However, some examples notwithstanding, the causes of the overwhelming majority of genetic diseases remain obscure. More importantly, an emerging lesson from genome-wide association studies is that, in most instances, the resolution necessary for identifying actual genes that underlie the phenotype is limited, as is our ability to develop mechanistic, testable disease models from such studies. These new realities will probably necessitate a paradigm shift in our approach to complex traits, for which the combinatorial application of genomic and functional studies will be necessary to understand the mechanism and pathology of genetic disease. Here I will discuss these issues and highlight how additional sequencing and genotyping of ever-increasing cohort sizes without functional interpretation is unlikely to improve our ability to dissect the genetic basis of complex traits.

These discoveries infuse us with optimism about the potential for understanding the key homeostatic pathways that are causally related to these disorders, which could lead to new drug development and better patient management. At the same time, these studies have presented us with some surprising findings. Most importantly, lost in the collective euphoria of apparently rapid progress is the fact that, some notable exceptions notwithstanding, we are now faced with three major challenges. Firstly, the total sum of disease risk attributed to common variants detected by genome-wide association studies (GWASs) remains modest, raising a problem analogous to the astrophysicists' search for 'dark matter'. Secondly, the overwhelming majority of single nucleotide polymorphisms (SNPs) associated with disease lie in intra-or intergenic regions; there has been a striking dearth of coding SNPs. Finally, almost none of the association studies have led to the identification of the causal allele(s), and thus neither the 'disease-associated' gene, nor indeed the type of lesion (gain-of-function, loss-of-function, and so on), can be identified unequivocally. The sheer number of disorders subjected to GWAS suggests that these deficiencies are unlikely to be random findings, or to represent idiosyncratic genetic architecture of specific phenotypes. Rather, it is likely that these observations result from biological principles that will require a significant shift in our approach to understand them.
The second challenge, that of the modest risk conferred by common variants, is in some ways a minor problem, despite the unwelcome consequence that it potentially hinders, and might even defeat, the utility of high-density genotyping as a prognostic clinical tool. The fact that the relative risk conferred by most SNPs associated with Crohn's disease lingers in the 1.2-1.5 range (with the exception of alleles of NOD2, which encodes nucleotide oligomerization domain protein 2, and IL23R, which encodes the interleukin-23 receptor) does not detract from the establishment of the hypothesis that defective autophagy is part of the cause of this disorder [10], which has, in turn, enabled the development of mammalian models to study the disease process [11,12]. Likewise, the association of the same genomic regions with discrete disorders can highlight not only suspected relationships (rheumatoid arthritis and lupus), but also novel and surprising ones (diabetes and colon cancer; see [1] for more details).
The first and third challenges mentioned above, those of the genetic 'dark matter' and allele causality (coupled to mutational models), are more complex and are probably interlinked. Reasonable hypotheses have been put forth that suggest that the failure to detect the majority of the genetic load in complex disease is a reflection of insufficient statistical power, as evidenced by the accelerated locus discovery by meta-analyses. At the same time, evolutionary arguments have been proposed to explain the preferential enrichment of non-coding variants in complex disease. Most prominently, it has been suggested that strong (often coding) mutations that have a significant impact on fitness would be more likely to be associated with Mendelian disorders, whereas milder alleles that affect spatiotemporal patterns of gene expression are more likely candidates for exerting modest effects in complex disease [13].
It is reasonable to predict that understanding of the total genetic load in complex disease will be enhanced by higher density genotyping, increased sample size and expanded ethnic diversity, and medical resequencing of risk regions. Importantly, however, none of these tools will unequivocally enable the transition from association to causality.
The sobering reality remains that, despite the hundreds of loci identified, the number of bona fide genes associated with complex disease identified through GWASs is modest. Regrettably, a guilty-by-association view has emerged and the definitions of 'locus' and 'gene' are becoming dangerously interchangeable. At best, most arguments for the role of specific genes on conferring susceptibility put forth to date are correlative; although many will turn out to be correct, this is unlikely to be the ubiquitous truth. Recent work in age-related macular degeneration (AMD) highlights the problem. Case-control association studies pointed to a region on 10q26 that conferred strong susceptibility to AMD, and further work suggested that variation in the promoter region of HTRA1, which encodes a multi-functional serine protease, might explain the effect [14]. However, additional genotyping [15] and, independently, complete resequencing of the risk haplotype, pointed to a deletion in the 3' untranslated region (UTR) of predicted transcript LOC387715 that destabilizes its message, suggesting that loss of that transcript, and not HTRA1, drives the susceptibility to AMD [16]. Unfortunately, the presence of a haplotype that is not associated with AMD that carries a premature termination codon in LOC387715 confounds that hypothesis as well [17].
Similar questions should be raised about the direct involvement in obesity of the FTO gene, given that the associated SNP lies in intron 1 of that gene's transcript [18]. Even though the FTO protein is now being investigated mechanistically to understand energy regulation [19], there is no actual evidence that FTO is the gene that underlies the association. Notably, FTO, and the neighboring gene FTM (or RPGRIP1L), which lies less than 1 kilobase away in a head-to-tail orientation, have been shown to be co-regulated by the transcription factor CUTL1, and at least one of the associated SNPs maps to a predicted CUTL1 binding site [20], raising the possibility that dysfunction at FTM, not FTO, might be the driver of the phenotype. Intriguingly, FTM is mutated in several ciliopathies, which are hallmarked, among other features, by hyperphagia-driven obesity [21].
There is no doubt that additional, denser genotyping, admixture mapping and deep resequencing will refine current loci and uncover new ones. However, it is important to consider the relative value of these data without physiologically relevant functional interpretation. In some instances, population-and genetic-based arguments can suggest that certain alleles cause certain phenotypes, as has been shown by medical resequencing of candidate genes in patients at the extremes of high-density lipoprotein (HDL) and low-density lipoprotein (LDL) plasma levels [22,23]. Nonetheless, even in these examples, in vitro assays that tested the functional consequences of the mutations were necessary for the researchers to present compelling arguments.
Given these observations, it might be prudent to refocus our efforts on how to evaluate the physiological impact of genetic and genomic variation on gene function, because without that ability, the contribution of additional GWASs on understanding disease mechanisms will remain limited. Given the plethora of in vitro and in vivo tools available to the community, it should be possible to develop assays that can test directly the effect of gene variation in appropriate cell lines, cell types and animal models; without such tools, we might be left with an unexplorable catalog of associated SNPs but gain little wisdom about how to develop diagnostics and therapeutics for complex traits.

C Co om mp pe et ti in ng g i in nt te er re es st ts s
The author has no competing interests to declare.
A Ac ck kn no ow wl le ed dg ge em me en nt ts s I thank Sara Katsanis and Erica Davis for helpful discussion and editing of the manuscript. This work was supported by grants R01HD04260 from the National Institute of Child Health and Development, R01DK072301 and R01DK075972 from the National Institute of Diabetes, Digestive, and Kidney disorders and P20MH084018 from the National Institute of Mental Health.