Genome-wide association studies are coming for human infectious diseases

A genetic contribution to infectious disease in human populations has long been suspected and is now supported by more than 50 years of epidemiological evidence showing, for example, infection rates to be much higher than disease rates. In successful family studies of high-penetrance effects, single gene mutations have been identified that reveal a molecular mechanism leading to increased risk of a specific infectious disease. However, in population-based studies, genetic variants conferring host susceptibility to various infectious diseases have been difficult to uncover. Although mutations such as that in the CCR5 gene, which confers protection against HIV infection, have been reliably discovered, polymorphisms affecting larger proportions of a population have been hard to prove definitively. The recent arrival of the genome-wide association study format, currently being applied to Kawasaki disease, tuberculosis, malaria, HIV, dengue and others, gives us hope that these challenges can finally be met, with implications for population-based treatment and prognosis strategies.

Infectious agents continue to have a major influence on human evolution as a result of the widespread nature of these agents in the environment and the predominance of infectious disease in children, resulting in a clear impact on allele transmission depending on the outcome of this early encounter. For example, in areas with endemic dengue disease, such as Vietnam, seroprevalence studies show that infection has occurred in over 88% of children by the age of 15 and yet disease that results in hospitalization occurs in less than 0.1% [1]. Is this infectious disease distribution predominantly due to widespread general resistance over multiple genes or is susceptibility due to rare mutations in a few specific genes? This question on the nature of the genetic change, rare or common, has remained, despite numerous population studies of infectious disease.
C Ca an nd di id da at te e g ge en ne e s st tu ud di ie es s Although family-specific mutations have been shown to have indisputable consequences in infectious disease [2-5], their effects in the general population are thought to be small [6]. There are also some very well established population-based mutations with clear effects on disease, from sickle cell trait protection against malaria [7] to CCR5, a known receptor of HIV, defective alleles in which have been associated with resistance to HIV infection [8]. Indeed, rare mutations leading to increased susceptibility to infectious disease can be identified in the general population, when looked for carefully [9]. Interestingly, the evidence from this field of work is currently pointing towards the concept of 'one gene, one infection' [10]. The strong evidence that a mutation in a single gene frequently leads to increased susceptibility to only one infection is also supported by animal studies, which have used forward genetics to show that, although many genes may be involved in recognizing a single pathogen, mutations in only one or a few lead to increased susceptibility to disease [11].
Despite the clear evidence identifying these so called 'major genes', it is unlikely that mutations in single genes are the only causes of genetic susceptibility in a population. So far, the work done to extend this approach to a population basis has used candidate gene studies and, in general, when they harmonize with the major genes findings, significant insights can be gained from them. For example, in meningococcal disease the 'major genes' lie in the complement pathway, with mutations in many of these genes (each restricted to just a few individuals) leading to greatly increased risk of disease [12]. Using one of these genes (that encoding mannose binding lectin) as a candidate gene in a population-based study has supported and extended these findings to the general population, revealing a polymorphism that can alter the initiation of this pathway, and thus the susceptibility to disease, in a large number of individuals [13]. However, candidate gene approaches are ultimately restricted to those few genes that can be strongly implicated by current knowledge, and there is little understanding of their overall contribution to disease in relation to other genes.
M Mo ov vi in ng g t to ow wa ar rd ds s g ge en no om me e--w wi id de e s st tu ud di ie es s Two of the first efforts to apply a genome-wide approach to multi-factorial infectious diseases were done in schistosomiasis and tuberculosis, carried out in Brazilian and African families, respectively [14,15]. The tools available at the time only allowed linkage studies using a few hundred genetic markers spread across the whole genome. Nowadays, data generated after the sequencing of the human genome and the HapMap project have identified millions of single nucleotide polymorphisms facilitating the implementation of genome-wide association studies (GWASs). The GWAS approach, although new, is already well described and has significant methodological improvements compared with previous approaches; GWAS analysis has already enabled the successful identification and replication of novel genes for several complex disorders such as Crohn's disease, rheumatoid arthritis, type 1 and type 2 diabetes, macular degeneration and prostate cancer [16][17][18].
The first GWAS in infectious disease was done in 2007, when Fellay et al.
[19] determined genetic components influencing the viral load of HIV-positive patients during the asymptomatic phase of the disease. However, this study did not investigate host genetic susceptibility. The first GWAS of infectious disease susceptibility was carried out in a group of patients affected with Kawasaki disease [20], a self-limiting acute vasculitis mostly affecting children below 5 years of age [21]. This study started by genotyping a small group of European origin patients and controls with an Affymetrix 250K Nsp chip; a follow-up group comprising affected children and parents was then used to confirm the initial findings; and finally a third stage, including fine-mapping of eight associated genes, further validated the genetic associations. Interestingly, five of the eight fine-mapped genes formed a connected network that also showed significant differences in transcript levels between acute and convalescent stages of the disease, suggesting that multiple genes in a pathway may be involved in susceptibility to disease, something that may turn out to be important for other infectious diseases. However, in comparison with disorders such as inflammatory bowel disease or macular degeneration, for which the small number of genetic variants detected were shown to confer a highly significant disease risk to carriers [17,22], the study on Kawasaki disease [20] has not found such a profound effect, hinting that infectious disease may not yield easily to the GWAS approach. It is also interesting to see that the HLA locus, such a clear candidate region for infectious disease because its role in discriminating self from non-self was thought to have evolved to protect from infection, did not cause the dominant association seen in many autoimmune diseases [23,24], and this may also turn out to be true for many infectious disease studies.

F Fa ac ci in ng g n ne ew w c ch ha al ll le en ng ge es s
There are several challenges for the application of GWAS to determine genetic variants that confer susceptibility to infectious diseases. First, the current design of commercially available single nucleotide polymorphism chips is skewed towards common variants, with at least 1% presence in the population, following the 'common variant, common disease' hypothesis [25,26]. However, if susceptibility to an infectious disease is caused by one or a combination of rare variants, as previously reported for other disorders such as autism [27], we would not be able to detect them using the current technology. In that scenario, sequencing of the candidate genes would be the most direct way to tackle the problem. Second, structural variants, such as copy number variants that were barely analyzed in the past, are currently being investigated in more detail and have been successfully linked to susceptibility to schizophrenia [28,29]. Recent development of new technologies as well as analysis tools will facilitate the study of copy number variants, which might potentially be involved in host susceptibility to infectious diseases. Third, a landmark paper by the Wellcome Trust Case Control Consortium [16] introduced the concept of using the same set of population controls, without detailed phenotype information for different diseases, in case-control studies. Although there is the possibility of misclassification, meaning that a proportion of controls might have the disease or might develop it in the future, this could be a very useful approach for the many infectious diseases for which the incidence rates in the general population are low (less than 1%), such as meningococcal disease, severe dengue and others. It is worth noting, however, that for infectious disease the environmental agent (the pathogen) that triggers the disease is usually known and past exposure to the agent is often easily measured (using serology), criteria that are frequently not available in diseases such as cancer or autoimmunity. This means that it is possible in infectious disease to select controls with known (antibody detected) exposure, but without disease. Finally, another important aspect to take into account is the role of the pathogen's genetic variability and its interaction with host genetics. An interesting study by Caws et al. [30] has hinted at a relationship between host and pathogen genotypes in the development of tuberculosis.
When these challenges are successfully addressed, a GWAS will offer substantial insights into the role of common variation in an infectious disease and, with its comprehensive nature, will hopefully enable us to elucidate the involvement of multiple genes or previously unsuspected pathways in disease.

C Cl li in ni ic ca al l s si ig gn ni if fi ic ca an nc ce e
Moving from genetic discovery to direct clinical relevance has not been as difficult for infectious disease as it has been for many other diseases, perhaps because there is a relatively detailed understanding of the immune system. For instance, patients with Mendelian diseases characterized by diminished production of interferon or interferon pathway activation could be treated with recombinant forms of interferon, helping them to mount an immune response against various pathogens. Although there is still no treatment using the CCR5 gene, it has been a tempting target for therapies in the HIV field and may yet prove to be effective. In another example, a polymorphism associated with increased serum concentrations of complement Factor H was revealed to be a susceptibility factor for meningococcal disease, as Factor H bound to and protected the causative bacteria from complement attack [31]. This is a finding with implications for the vaccine community, which showed that antibodies that prevent this binding enable complement-mediated killing of the bacteria and, when incorporated into a vaccine, has been the first successful strategy to prevent group B meningococcal disease [32].
With the arrival of the GWAS approach to infectious disease, we can expect many more genetic loci to be revealed, molecular pathways to disease described and thus therapeutic targets identified, with the hope that these can be quickly translated into treatments.
In conclusion, the first GWASs of infectious diseases have been published [19,20], and more will come in the near future, with important implications for our understanding of these diseases. Although initial hints suggest that big hits will not lead the way, the unraveling of disease mechanisms through networks of genes should reveal novel ways of tackling infectious diseases in the clinic.
A Ab bb br re ev vi ia at ti io on ns s GWAS, genome-wide association study.
C Co om mp pe et ti in ng g i in nt te er re es st ts s The authors declare that they have no competing interests.
A Ac ck kn no ow wl le ed dg ge em me en nt ts s SD and MLH are funded by the Singapore, Agency for Science, Technology and Research (A*STAR).