Gene-gene and gene-environment interactions: new insights into the prevention, detection and management of coronary artery disease

Despite the recent success of genome-wide association studies (GWASs) in identifying loci consistently associated with coronary artery disease (CAD), a large proportion of the genetic components of CAD and its metabolic risk factors, including plasma lipids, type 2 diabetes and body mass index, remain unattributed. Gene-gene and gene-environment interactions might produce a meaningful improvement in quantification of the genetic determinants of CAD. Testing for gene-gene and gene-environment interactions is thus a new frontier for large-scale GWASs of CAD. There are several anecdotal examples of monogenic susceptibility to CAD in which the phenotype was worsened by an adverse environment. In addition, small-scale candidate gene association studies with functional hypotheses have identified gene-environment interactions. For future evaluation of gene-gene and gene-environment interactions to achieve the same success as the single gene associations reported in recent GWASs, it will be important to pre-specify agreed standards of study design and statistical power, environmental exposure measurement, phenomic characterization and analytical strategies. Here we discuss these issues, particularly in relation to the investigation and potential clinical utility of gene-gene and gene-environment interactions in CAD.

I In nt tr ro od du uc ct ti io on n Genetic investigations of coronary artery disease (CAD) aim to identify functional variants to assist with its diagnosis, prognosis or treatment. The full spectrum of DNA variant sizes and frequencies, ranging from single nucleotide changes to large copy number variations and from rare mutations to common polymorphisms, are components of a comprehensive approach to identify genetic determinants of CAD. However, CAD is the terminal manifestation of multiple intermediate disease processes, which individually have genetic and environmental determinants ( Figure 1). For genetic research into CAD to be truly comprehensive, experimental methods must identify environmental and genetic factors and their interactions [1,2].
It seems reasonable that the effect of a CAD susceptibility allele could differ depending on the context of other genetic or environmental factors. For instance, is it effective to search for a gene underlying type 2 diabetes mellitus (T2DM) in high performance athletes? Although such athletes may be genetically predisposed to T2DM, their activity levels would probably protect them from expressing the phenotype. However, although gene-gene or gene-environment interactions seem to be an obvious topic for consideration, the analysis of such interactions is not yet routine in genetic studies of CAD. Here, we will focus on interaction types, strategies to detect interactions, potential biases and the statistical issues involved in studying gene-gene and geneenvironment interactions in CAD.

T Ty yp pe es s o of f i in nt te er ra ac ct ti io on ns s
Broadly defined, interactions are differences in the strength of association between a gene and phenotype on the basis of the presence of, absence of or quantitative differences in an additional factor, which could be another genetic variant or an environmental exposure. There are several putative models for gene-environment interactions, including synergy, modification of effects and redundancy ( Figure 2). For a genegene interaction, the additional factor might be dichotomous, such as carrier versus non-carrier status, or additive, such as zero, one or two copies of the minor allele. For a gene-environment interaction, the additional factor can similarly be dichotomous, such as presence or absence of smoking history, or it can be a continuous variable, such as number of pack-years smoked.
R Ro ol le e o of f i in nt te er ra ac ct ti io on ns s i in n g ge en ne et ti ic c a as ss so oc ci ia at ti io on n s st tu ud di ie es s Recent advances in cost-effective, array-based, highthroughput genotyping platforms have led to a flood of investigations of common single nucleotide polymorphisms (SNPs) in various diseases. Genome-wide association studies (GWASs) have successfully identified genetic determinants of CAD and its component risk factors [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. For instance, several investigations found a region of chromosome 9p21 that was associated with CAD independently of traditional risk factors [3][4][5][6]. Furthermore, multiple genetic associations for T2DM [7,17] and body mass index (BMI) [18] have been discovered. However, most associated loci from GWASs have been reported for lipoprotein traits, including over 30 loci associated with plasma concentrations of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglyceride [7][8][9][10][11][12][13][14][15][16]. The success in finding genetic associations with lipoprotein phenotypes was due to methodological standardization (accuracy and precision) in trait measurement and to evaluation of large sample sizes, allowing detection of relatively subtle effects. Metaanalyses and collaborative consortia with large sample sizes have allowed GWASs to detect risk variants with low minor allele frequencies (<5%) and small effect sizes (odds ratio of about 1.1 to 1.7) (Box 1); SNP association studies may have already reached their limit to detect clinically or Putative gene-environment interactions. For even the simplest case, a dichotomous genetic risk factor (for example, carriers versus noncarriers) and a dichotomous environmental risk factor (for example, present versus absent), several types of interactions are possible. If both the gene and environment have main effects (odds ratios >1), and thus could be identified independently, a synergistic interaction would result in an effect size larger than a simple additive effect. A second possibility is that an environmental factor could have no main effect but could modify the effect of a genetic factor that does have a main effect, creating a larger than expected combined effect. The inverse is also possible, in which a modifier gene with no main effect of its own increases the effect size of an environmental risk factor. A fourth possibility is that neither the gene nor the environment has a detectable main effect, and interaction is required to produce a measurable effect. A fifth possibility is for a gene and an environmental factor to have redundant effects, in which case the combination of factors produces no increase in risk. These types of interactions can be extended to include different effect sizes or gene-gene interactions. biologically relevant loci with such effect sizes [8,11,13,17,18].
Despite recent success in identifying CAD-associated SNPs, much of the genetic component of CAD and its risk factors remains unattributed. Forcing additional genetic markers with small effect sizes into predictive models only marginally improves prediction over traditional risk factors [19]. However, accounting for gene-gene and gene-environment interactions might produce a meaningful increase in the combined effect of the genetic determinants [1,2]. To ensure a valid assessment of gene-gene and gene-environment interactions, standards are required for sample sizes, accuracy and precision for continuous data, specificity and sensitivity for discrete data and appropriate statistical methods. Phenomics, defined as the comprehensive characterization of phenotype and environmental exposure [20], is also of key importance. For CAD, these intermediate phenotypes include blood coagulability, hypertension, altered lipid metabolism, cell proliferation and inflammation. When a new gene or locus is discovered, such as the chromosome 9p21 region associated with early CAD [3][4][5][6], and its association is subsequently replicated in multiple study samples [21][22][23][24], the basis of the association with CAD is assumed to be mediated through a pathogenic pathway [22]. This assumption will guide the design of subsequent functional experiments. Similarly, newly identified environmental determinants might exert their influence through one or even several pathogenic mechanisms and might even help identify previously unappreciated pathways.
Although the effect sizes of SNP associations identified in GWASs of CAD are modest, they are still important because: (i) individual associations can be combined to obtain larger cumulative effects; (ii) genes with small effects in GWASs can point to targets for drug-based or other interventions; (iii) genes with small effects in GWASs might contain rare, large-effect mutations in more severely affected patients; (iv) some GWAS loci with no previous CAD association might unveil new pathways; and (v) the effects of a GWAS locus could be amplified by gene-gene or gene-environment interactions.
These principles can be extended to the study of geneenvironment interactions. For instance: (i) individual environmental interactions could be combined to obtain a cumulatively larger effect; (ii) rare extreme environmental exposures may display larger effects on the CAD phenotype than more common or typical environmental variation; (iii) identification of gene-environment interactions might suggest new hypotheses to evaluate disease-causing mechanisms. These principles could direct the design of future studies of gene-environment interactions in CAD.
M Mi in no or r v ve er rs su us s m ma aj jo or r a al ll le el le es s a as s a a r ri is sk k f fa ac ct to or r f fo or r C CA AD D How do alleles affecting CAD susceptibility arise? DNA mutagenesis could provide a basis for understanding the generation of risk alleles. Several mutagenic mechanisms have been identified [25]. If a DNA error escapes repair and becomes embedded in the genome, it could, by affecting the expression or function of a protein, modify CAD risk either positively or negatively. If the recently mutated allele increases CAD risk, it is possible that genetic drift, inbreeding, pleiotropy, heterozygote advantage or small effects on reproductive fitness could be responsible for the allele reaching appreciable frequencies in the population [26]. For CAD, mortality typically occurs after the reproductive years, thus reducing selection pressure against deleterious alleles. Another possibility is that an environmental change might cause an allele that once had a neutral or beneficial effect to become deleterious.
Alternatively, if the mutated allele is beneficial, reducing CAD risk, one would expect the allele to increase in frequency to become the major allele. If the mutation occurred relatively recently, it is possible the minor allele is gradually becoming more prevalent. Such 'protective' minor alleles, or conversely major alleles that increase CAD risk, are possibly important from a public health perspective, since defining a gene-environment interaction might suggest an environmental intervention with a potentially large impact, due to the high population prevalence of the risk allele.
A An na al ly yt ti ic ca al l d de et te ec ct ti io on n s st tr ra at te eg gi ie es s Gene-gene and gene-environment investigations have included family-based and population-based samples in retrospective and prospective designs. Statistical methods have included methods modifying regression and chisquared analyses, as well as statistical classification techniques, such as neural networks, support vector machines or Bayesian networks (Table 1). Although the statistical methods used in GWASs are fairly consistent and include regression and chi-squared analysis [3-8,10-18,27-30], the statistical approaches to detect gene-gene and gene-environment interactions are somewhat less standardized at present.
Investigators have tested for association between the cumulative number of risk alleles at multiple independent loci and disease [11,27,28]. Absolute allele counts [28] and relative weighting of alleles on the basis of their effect size [11,27] have both been reported. Although this showed that the alleles were independent and their effects could be added, no interaction between the alleles was measured. Subgroup analyses, in which the strength or effect size of the association is compared between sample subgroups, have substantially less power to detect an association than the original intact sample, increasing the risk of false negative results. For example, assuming 80% power to detect a difference in allele frequencies between cases and controls within one subgroup, the second equally sized subgroup will yield disparate results about 30% of the time just by chance. M Mu ul lt ti ip pl le e c co om mp pa ar ri is so on ns s If N genetic variants are entered into an analysis, N*(N-1)/2 potentially interacting pairs can be constructed. Selecting a priori known functional SNPs, or SNPs with coinciding spatial or temporal expression patterns, is one approach to reduce the number of tests. An alternative approach is first to test for marginal main effects in a primary hypothesisgenerating analysis and then to test for interactions between those significant effects in a second analysis in which the nominal level of significance has not been substantially adjusted [33]. In GWASs, permutation testing, control of false discovery rates and Bonferroni correction have been used to determine appropriate significance thresholds. Whatever approach is used, care will be required for selecting the nominal level of significance in gene-gene and gene-environment investigations.  Table 2). The accuracy and precision of genotyping technologies render genetic investigations relatively resistant to measurement bias, compared to other sources of potential bias. Unequivocal disease phenotypes, such as myocardial infarction or coronary bypass surgery, are least susceptible to measurement bias. New imaging techniques, such as ultrasound-based intima-media thickness or magnetic resonance imaging (MRI)-based plaque volume calculations, are more susceptible to systematic errors of measurement. Self-reported measures of environmental exposure, such as caloric intake, energy expenditure or alcohol use, are most vulnerable to biases. Strategies to maximize the sensitivity and specificity of environmental factor measurement will improve the likelihood of detecting a significant association signal for interactions with genetic determinants [36].
Study design can affect bias, because prospective cohort studies are generally more resistant to bias than retrospective case-control designs [1]. Survivorship bias and population stratification are less common in prospective studies, assuming a truly representative cross-sectional cohort. Survivorship bias is a potential liability of retrospective studies of CAD, because patients with a fatal first myocardial infarction (up to 30% of cases) cannot be included in future studies. Recall bias, in which the study participant is more likely to remember an environmental exposure if it is associated with a negative outcome, respondent bias, in which patients alter their answers to exposure questions following a negative outcome, and exposure suspicion biases, in which investigators query individuals who have a negative outcome more thoroughly, are all reduced in prospective designs, as long as environmental exposure information is collected from all study participants irrespective of CAD outcomes.
S St ta at ti is st ti ic ca al l p po ow we er r Statistical power is directly proportional to the number of study participants and to the size of the effect under study. Factors to be included in power calculations of all genetic investigations include the minor allele frequency, the degree of linkage disequilibrium between the queried marker and the hypothetical disease locus, the genotype error rate and the genetic or phenotypic heterogeneity (Box 2). Fortunately, high-throughput genotyping platforms have a negligible genotype error rate [37]. Correction for multiple comparisons and the measurement error of environmental exposures also influence study power [1,2]. As a result of the greater accuracy of genotyping compared with the measurement or report of environmental exposures, there is theoretically more power to detect a gene-gene interaction than a geneenvironment interaction for the same sized sample. Studies with inaccurate or imprecise measurement of phenotype or environmental exposure may require up to 20 times larger samples to detect an association signal above background noise [36]. However, the power advantage of gene-gene investigations resulting from their higher measurement accuracy is diminished by the need to correct for multiple comparisons and by the potentially increased complexity of interactions compared with gene-environment investigations.
How large a sample is required for adequate power to find gene-gene and gene-environment interactions? A rule of thumb is that a four-fold increment in sample size is required to test for a multiplicative interaction of two main effects [2,38]. This may overestimate the sample size requirement, especially if the effect of the interaction is larger than the main effects, but it illustrates the general requirement for a larger sample size when interactions are introduced into hypothesis testing. Given that many previous candidate gene studies, and even many GWASs, were powered to detect only main effects, testing these samples for gene-gene and gene-environment interactions has the potential for false positive and false negative results [2,3]. Higher-order interactions will require even larger samples to attain suitable power and may not be possible even among the largest current association studies [1].
E Ex xa am mp pl le es s o of f i in nt te er ra ac ct ti io on ns s i in n m mo on no og ge en ni ic c C CA AD D Studies of monogenic susceptibility to CAD have revealed several gene-gene and gene-environment interactions. For T Ta ab bl le e 2 2 P Po ot te en nt ti ia al l b bi ia as se es s i in n g ge en ne e--g ge en ne e a an nd d g ge en ne e--e en nv vi ir ro on nm me en nt t i in nv ve es st ti ig ga at ti io on ns s o of f c co or ro on na ar ry y a ar rt te er ry y d di is se ea as se e ( (C CA AD D) )

Bias
General description Application to CAD Selection bias Skew in the selection of study participants Patients with strong family history may self-select for study participation; patients with strong family history may be more likely to be referred to tertiary care and research centers Survivor bias Selection of study participants may miss Patients whose first myocardial infarction is fatal are less likely to be (prevalence-incidence mild disease or severe fatal cases studied bias)

Recall bias
Patients are more likely to recall an Patients with CAD may be more likely to remember an environmental environmental exposure if it was linked to a exposure because of its negative consequences negative outcome

Respondent bias
Patients answer in the way they believe they Patients with CAD and knowledge of potential CAD risk factors will be should answer, not the true answer more motivated to report those exposures

E Ex xa am mp pl le es s o of f i in nt te er ra ac ct ti io on ns s w wi it th h c co om mm mo on n S SN NP Ps s
Although interactions between environment and disease penetrance in rare monogenic disorders are instructive, a much larger potential impact could be seen in common

Box 1. Glossary of statistical terms
Bias: a tendency leading to conclusions that systematically differ from the truth, typically resulting from inadequate study design or poor methodology.
Bonferroni correction: in multiple hypothesis testing, a conservative adjustment of the significance threshold achieved by dividing the nominal significance threshold by the number of tests performed in order to control the number of false positive (type 1) errors.
Effect size: the change in the dependent variable -typically the trait value or likelihood of disease phenotype -that is associated with a given change of the independent variable -typically the number of minor alleles or risk allele carrier status of an individual.
False discovery rate: in multiple hypothesis testing, a group of methods in which the primary goal is to control the number of false negatives (type 2 error rate), as opposed to false positives (type 1 error rate).
Interaction effect: the effect size of the combination of two or more independent variables, when the main effect of the variables individually is excluded.
Main effect: used in multivariate study designs, the effect size of the independent variable on the dependent variable, excluding other possible independent variables.
Permutation testing: a computational technique to estimate statistical significance by comparing the actual observed test result with the distribution of all possible test results that can be generated by swapping trait values or disease status between study participants.
Population stratification: allele frequency differences that occur between sample subgroups because of differences in genetic ancestry rather than a direct role of the alleles in disease susceptibility.
Regression: an encompassing term to denote a group of statistical methods that attempt to model the relationship between a dependent variable using one or more independent variables. The model typically contains multiple terms, with each term usually comprising a dependent variable or combination of dependent variables with weighting factors.
Significance threshold: the statistical significance, or the probability that the observation has occurred by chance, that needs to be overcome before declaring an association as true.
Statistical classification: a group of non-linear statistical techniques that attempt to group or classify an individual on the basis of patterns of descriptive variables derived from a previously examined pool of data.
Statistical power: the probability that a statistical test will correctly reject the null hypothesis.
complex CAD susceptibility because of small-effect common SNPs. The effect of the environment might be even more pronounced in patients whose phenotypes are caused by the aggregation of small contributions from many genetic and non-genetic factors. Examples of replicated gene-gene and gene-environment interactions identified in investigations of common SNPs in candidate genes are shown in Table 3. For instance, increased CAD risk has been observed in smokers with null genotypes for glutathione S-transferases, which are involved in the detoxification of carcinogens and products of oxidative stress [47,48]. Furthermore, smokers who are carriers of at least one APOE E4 allele seem to have significantly higher concentrations of oxidized LDL cholesterol compared with non-carriers, potentially further increasing CAD risk [49,50]. Humphries and colleagues report a robust association between the -455G>A SNP of the fibrinogen beta chain (FGB) gene and elevated post-exercise fibrinogen levels [51,52]. Elevated fibrinogen levels may modulate the myocardial infarction risk associated with the Leu34 allele of blood coagulation factor XIII (F13A1), a tetrameric zymogen that protects the fibrin clot from proteolytic degradation [53,54]. These candidate gene-environment interactions were examined because of plausible biological relationships, but large-scale replications are still required, with careful attention to the issues raised in this article.

E Ex xa am mp pl le es s f fr ro om m G GW WA AS Ss s
Gene-gene or gene-environment interactions are not yet routinely evaluated in GWASs, but two recent reports include exploratory examinations. Kathiresan and colleagues performed a two-stage GWAS of plasma lipoproteins [11]. The first stage identified over 1,000 associated SNPs in 25 loci (p < 5 × 10 -8 ) [11]. The second stage analysis re-tested all SNPs using 36 of the significantly associated SNPs from the first stage as covariates in the regression. The number of associated SNPs was reduced to 105 in 7 loci (p < 5 × 10 -8 ) [11]. All loci identified in the second stage had been identified in the first stage of analysis, suggesting that additional SNPs in known loci -that are not in linkage disequilibrium with the SNPs used as covariates -are associated with lipoprotein traits.
Sabatti and colleagues examined genome-wide gene-environment interactions, with the caveat that the work was underpowered to confidently identify interactions [13]. They examined four dichotomized environmental variables (sex, use of oral contraceptives, BMI over 25 kg/m 2 and gestational age), comparing differences in effect size between the two subgroups and two variables separated into quintiles (birth BMI and early growth), which were tested by regression using an interaction term [13]. At least one interaction SNP was identified (p < 5 × 10 -7 ) for five out of six environment variables, although none of the SNPs were in genes with a main effect or with known biological relevance [13].
These findings represent possible novel associations with metabolic CAD risk factors, but replication in larger samples is required. The issues discussed above in relation to study design, power and analytic strategies to detect gene-gene and gene-environment interactions are relevant to these large multi-center population studies, as these studies will form the precedent for future investigations.
C Cl li in ni ic ca al l i im mp pl li ic ca at ti io on ns s Accounting for gene-gene and gene-environment interactions will probably be important for future strategies of diagnosis, prognosis and management of CAD. For instance, current treatment guidelines for CAD prevention require risk stratification of the patient. CAD risk strata in a currently disease-free patient are calculated using traditional epidemiological risk factors, such as older age, male sex, the presence of cigarette smoking, diabetes, hypertension, dyslipidemia and, in some models, a family history of early CAD. Quantification of the patient's CAD risk using these variables guides the intensity of evidence-based drug treatment of modifiable risk factors, such as hypertension and dyslipidemia. It certainly seems feasible that reliable molecular genetic information can be included in future risk stratification models, improving precision over simply docu-menting a family history of CAD. Furthermore, combinations of specific genetic variables in the context of specific environmental variables -reflecting both gene-gene and gene-environment interactions -could help to re-stratify an individual between risk strata derived using non-molecular data. Also, given that such environmental factors as diet, activity level, stress, smoking and air quality are known to be important determinants of CAD risk, the first line of costeffective and safe intervention for an individual with a high genetic risk burden would include modulation of such environmental factors instead of more costly, high-tech approaches, such as gene-based biological therapies.

C Co on nc cl lu us si io on ns s
In the context of GWAS datasets, gene-gene and geneenvironment interactions are a new frontier for CAD association studies. GWASs have been extremely successful in identifying individual loci for CAD susceptibility, but the practical limits of sample size and array resolution for the identification of biologically valid loci will soon be reached. As a result of the high prevalence of CAD and the presence of large, multi-center prospective cohort initiatives with genotyping on high-density DNA genotyping arrays, gene-gene and gene-environment interaction studies of CAD will be possible in the future. Rigorous testing for gene-gene and gene-environment interactions should be built into the experimental study design. To ensure that testing for interactions enjoys the same success as GWASs of CAD, precise standards, including suitable sample sizes, reliable methods for measurement of environmental exposures, phenomic characterization and statistical analyses, will be required to minimize both false negative and false positive findings and to allow findings to be compared across samples and reports. The increment in the understanding of CAD susceptibility provided through systematic study and replication of gene-gene and gene-environment interactions will permit a more complete set of tools for diagnosis, disease prediction and prognosis and tailored therapy, perhaps using appropriate environment-based interventions.
A Ab bb br re ev vi ia at ti io on ns s BMI, body mass index; CAD, coronary artery disease; FGB, fibrinogen beta chain; FH, familial hypercholesterolemia; F13A1, blood coagulation factor XIII subunit A1; GWAS, genome-wide association study; HMGCR, 3-hydroxyl-3-methylglutaryl-CoA reductase; LDL, low-density lipoprotein; LDLR, low-density lipoprotein receptor; SNP, single nucleotide polymorphism; T2DM, type 2 diabetes. C Co om mp pe et ti in ng g i in nt te er re es st ts s The authors report no competing interests.
A Au ut th ho or r c co on nt tr ri ib bu ut ti io on ns s Both authors contributed to the conception and production of the manuscript and approved the final version. R Re ef fe er re en nc ce es s