Genomic prediction of celiac disease targeting HLA-positive individuals

Background Genomic prediction aims to leverage genome-wide genetic data towards better disease diagnostics and risk scores. We have previously published a genomic risk score (GRS) for celiac disease (CD), a common and highly heritable autoimmune disease, which differentiates between CD cases and population-based controls at a clinically-relevant predictive level, improving upon other gene-based approaches. HLA risk haplotypes, particularly HLA-DQ2.5, are necessary but not sufficient for CD, with at least one HLA risk haplotype present in up to half of most Caucasian populations. Here, we assess a genomic prediction strategy that specifically targets this common genetic susceptibility subtype, utilizing a supervised learning procedure for CD that leverages known HLA-DQ2.5 risk. Methods Using L1/L2-regularized support-vector machines trained on large European case-control datasets, we constructed novel CD GRSs specific to individuals with HLA-DQ2.5 risk haplotypes (GRS-DQ2.5) and compared them with the predictive power of the existing CD GRS (GRS14) as well as two haplotype-based approaches, externally validating the results in a North American case-control study. Results Consistent with previous observations, both the existing GRS14 and the GRS-DQ2.5 had better predictive performance than the HLA haplotype approaches. GRS-DQ2.5 models, based on directly genotyped or imputed markers, achieved similar levels of predictive performance (AUC = 0.718-0.73), which were substantially higher than those obtained from the DQ2.5 zygosity alone (AUC = 0.558), the HLA risk haplotype method (AUC = 0.634), or the generic GRS14 (AUC = 0.679). In a screening model of at-risk individuals, the GRS-DQ2.5 lowered the number of unnecessary follow-up tests for CD across most sensitivity levels. Relative to a baseline implicating all DQ2.5-positive individuals for follow-up, the GRS-DQ2.5 resulted in a net saving of 2.2 unnecessary follow-up tests for each justified test while still capturing 90 % of DQ2.5-positive CD cases. Conclusions Genomic risk scores for CD that target genetically at-risk sub-groups improve predictive performance beyond traditional approaches and may represent a useful strategy for prioritizing individuals at increased risk of disease, thus potentially reducing unnecessary follow-up diagnostic tests. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0196-5) contains supplementary material, which is available to authorized users.


Background
Genome-wide association studies (GWAS) have identified large numbers of genetic loci associated with complex human disease, particularly for many autoimmune diseases, where disease susceptibility is typically strongly linked to the human leukocyte antigen (HLA) region as well as loci outside HLA [1][2][3][4][5][6]. The strong disease associations of specific single nucleotide polymorphisms (SNP) have enabled genomic-based prediction models to be developed with substantial predictive power [7][8][9][10][11][12]. As new light continues to be shed on the fundamental role of these genetic links in disease pathogenesis, it is becoming increasingly likely that genomic-based tools to predict disease development and risk, as well as prognosis and clinical course, may be harnessed and applied with direct relevance to patient care.
Despite these advances, suitable clinical tools that quantify genomic risk for complex disease remain largely unrealized. This is largely due to a lack of understanding regarding how such tools might be utilized in clinical settings, hampered by the complexity of integrating such data into a risk model that often incorporates many other variables, such as clinical information and laboratory investigations. The role of genomic risk prediction in existing risk models and diagnostic pathways is still being determined, including whether genomic prediction is optimal as a complement or a replacement for existing assays. For autoimmune diseases in particular, genomic prediction needs to demonstrate improved risk stratification beyond known HLA risk haplotypes and have sufficient predictive power, particularly given the low disease prevalence in the general population (typically 1 % or less). Here, we have sought to identify specific scenarios where there is a major clinical need to improve risk prediction in a highly heritable autoimmune disease, celiac disease (CD). We first show that genomic risk prediction methods have clear advantages over existing approaches for CD risk prediction, and next assess the clinical implications of genomic risk prediction for disease management.
Current diagnosis of CD relies on the presence of CDspecific autoantibodies and a confirmatory small-bowel biopsy to demonstrate the characteristic intestinal villous atrophy [13]. Importantly, while both methods are useful for detecting current CD, they do not provide predictive information on the future risk of developing CD in a person without active disease. Early detection of CD is a clinical priority in order to reduce long-term risk of disease complications, especially for individuals at higher risk of CD, such as those who are a first-degree family members of an affected individual or those who have a related autoimmune disease. However, the follow-up care of people without active CD, particularly the question of whether and how often to perform follow-up testing, remains unresolved. This challenge stems from the desire for early disease detection conflicting with the need for minimizing repeated testing that is inconvenient, costly, anxiety-inducing, and entirely unnecessary for those individuals who will never develop disease.
Since the development of CD depends so strongly on several HLA risk haplotypes, HLA typing is able to achieve close to 100 % sensitivity in detecting at-risk individuals while also excluding those at very low risk. As a result, HLA typing for CD has been widely embraced in clinical practice [23]. Indeed, recent consensus guidelines recommend HLA typing as a first-line investigation in asymptomatic children at risk of CD (for instance, if they have a family history of CD) [13]. Further, combining HLA typing with CD-serology may also provide a more cost-effective diagnostic approach in some situations such as population screening by identifying falsepositive serology in non-genetically susceptible individuals, thus reducing the number of unnecessary and expensive confirmatory small bowel biopsies [24].
The presence of at least one of the CD-related HLA haplotypes, while necessary for the development of disease, carries little predictive value for eventual CD development and therefore has no role as a sole diagnostic for CD. The CD-related genotypes are highly prevalent in the community, with 30-40 % of Europeans and up to 56 % Australians carrying at least one [24]. Despite the fact that different HLA haplotypes impart varying levels of risk for CD, specifically HLA-DQ2.5 > DQ8 > DQ2.2, these differences in relative risk have not translated meaningfully into clinical practice. Thus HLA typing results are typically interpreted as simply 'susceptibility present' or 'susceptibility absent' regardless of the particular HLA haplotype detected, and therefore used primarily to exclude CD but not to detect high-risk individuals nor to predict future risk of disease [13].
Notwithstanding the limited clinical role of HLA typing, there has been renewed interest in the role of specific HLA alleles in the development on CD, with the risk of CD being far higher in children who are HLA-DQ2.5 homozygous (or have two copies of DQB1*02) than among those who are HLA-DQ2.5 heterozygous or are positive for HLA-DQ8 [25,26]. A gene-dose effect has been reported, with HLA-DQ2.5 homozygosity associated in some studies with a more severe clinical presentation of CD, refractory disease, and a slower rate of intestinal healing on treatment [27,28]. As a result, HLA-DQ2.5 zygosity status is a major variable in a recently proposed CD prognostic modeling tool [29]. Collectively, these studies highlight the important role of the HLA risk haplotypes, especially HLA-DQ2.5, in CD development, prognosis, and clinical behavior, but also highlight the current limitations in the clinical utility of HLA typing. There is a major need to develop tools that are more informative than HLA typing particular for those who have already been shown to possess at least one HLA-DQ2.5 allele (DQ2.5+).
We have recently performed a proof-of-principle study demonstrating that genomic data, derived from multiple case/control GWAS datasets, can be used to improve upon current genetic testing based on HLA typing [7]. A CD genomic risk score (GRS) based on genome-wide single nucleotide polymorphisms (SNPs), denoted here the 'GRS14' [8], was induced by supervised learning models trained on a British case-control GWAS study [30,31]. We have established the robustness of GRS14 to discriminate CD patients from population-based controls in UK, Dutch, Finnish, and Italian studies [3,4,32], achieving predictive performance (area under the receiver-operating characteristic curve (AUC)) substantially higher than other methods which attribute risk based on 57 non-HLA SNPs together with HLA alleles [33].
A recent large-scale CD Immunochip GWAS [34] has shown that statistical imputation of non-SNP genetic variants via SNP2HLA [35], including HLA alleles and amino-acid polymorphisms in HLA genes, enabled finemapping of CD-associated variation to a greater resolution than that afforded by SNPs alone, and subsequently substantially increased the explained heritability of CD. However, the implications for CD risk prediction had not been assessed.
Here, we first externally validate the predictive power of the existing GRS14 and novel variants thereof in a North American CD case-control study [22,36], comparing their performance to a previous HLA haplotype risk approach. Next, we focus on the DQ2.5+ subset of individuals, and develop HLA-DQ2.5-specific genomic risk scores, one based on SNPs and others based on SNPs together with SNP2HLA-imputed markers [35]. Finally, we assess the HLA-DQ2.5-specific genomic risk scores in screening scenarios to determine the number of unnecessary follow-up tests saved relative to other approaches.

Genotype and phenotype data
We obtained the North American dataset (cases and controls) from NCBI dbGaP (accession phs000274.v1.p1). Samples were genotyped on the Illumina 660W Quad v1A platform, assaying 2,246 individuals in total (1,716 cases, 530 controls, 723 male, 1,523 female). Individuals were considered to have confirmed CD based on either: (1) characteristic findings on small bowel biopsy according to ESPGHAN criteria; (2) biopsy-proven dermatitis herpetiformis; or (3) positive celiac serology panel (transglutaminase (tTG) and endomysial (EMA) antibodies). Controls were originally from the Illumina iControl database, matched by the original study authors for age, sex, and ethnicity [36]. To minimize the possibility of artificially inflating the apparent predictive ability of the models [37], we performed several stages of quality control on the genotype data using plink 1.9 [38-40]: removing non-autosomal SNPs, filtering SNPs by minor allele frequency (MAF) <1 %, missingness >10 %, deviation from Hardy-Weinberg equilibrium in controls P <5 × 10 −6 , and filtering of individuals with missingness >10 %. Next we removed 2,473 SNPs with case/ control SNP differential missingness P <10 −3 . We iteratively used principal component analysis (PCA), implemented in flashpca 1.2 [41], to identify outlier individuals, defined here as individuals with PC coordinates more than 3 standard deviations from the median of each of PC 1-50. After removing those individuals, PCA was repeated to verify the results. We used two iterations of this procedure, resulting in 1,697 individuals remaining. Finally, we removed one of two individuals with identity-by-descentπ < 0:05 (one individual was removed). The final QCd dataset consisted of 1,696 individuals (1,259 cases, 437 controls, 546 male, and 1,150 female) over 518,770 autosomal SNPs. The available clinical characteristics of the post-QC data are shown in Table 1.
In order to check whether the putative European descent of the North American individuals was indeed the case, we also combined an LD-thinned (plink -indep-pairwise) version of the North American data with an LD-thinned version of the UK, Finnish, Dutch, and Italian GWA data and performed PCA on the combined data (Additional file 1: Figure S1). Finally, we computed the fixation index F st [42] (plink -fst) on a combined dataset (European + North American) consisting of the 224 of the 228 SNPs in the GRS14, with the European and North American samples as two clusters.

Ethics statement
All participants gave informed consent and the study protocols were approved by the relevant institutional or national ethics committees. Details for the ethics protocols for the European GWA and Immunochip datasets are given in [3,4]. The North American NIDDK-CIDR dataset was obtained from the NCBI Database of Genotypes and Phenotypes (dbGaP), accession phs000274.v1.p1, following their respective access protocols.

Imputation of HLA alleles and other markers
While SNP arrays only produce unphased alleles, statistical methods can exploit patterns of linkage disequilibrium to impute haplotypes (phased alleles) and other forms of genetic variation, based on a reference panel where both the SNPs and the imputed variation are known (a training dataset). This procedure is conceptually the same as used when imputing, for example, millions of 1000 Genome genotypes based on a smaller set of hundreds of thousands of assayed genotypes.
As the majority of CD-related variation is found within the HLA, we imputed HLA variation, employing two complementary methods. First, we imputed HLA-DQA1 and HLA-DQB1 alleles from the SNPs using the R package HIBAG 1.2.3 [43], using the European hg18 HLA4 reference dataset. Based on the imputed HLA alleles, we inferred each individual's heterodimer type as one of DQ2.5 heterozygous, DQ2.5 homozygous, DQ2.2, or DQ8, according to the mapping in [8]. Following [33], the HLA risk score was assigned as low for individuals that did not have any of the CD risk heterodimers (DQ2.2, DQ8, DQ2.5-heterozygous, and DQ2.5-homozygous). High risk was assigned to individuals with DQ2.5-homozygous or those with both DQ2.5-heterozygous and DQ2.2. Medium risk was assigned to all other remaining individuals. The HLA risk profiles were coded as 0 for low, 1 for medium, and 2 for high risk. We did not examine the 57 non-HLA SNPs used in [33] as these are only present on Immunochip arrays and not on the genome-wide arrays, and were not well tagged by the existing SNPs on the genome-wide arrays.
Another imputation approach, SNP2HLA [35], has proven to be particularly useful for fine-mapping of the HLA region with regards to association with several autoimmune diseases, including CD [34], and for explaining more of the heritability of disease. SNP2HLA employs the assayed SNPs, together with a reference panel, to impute HLA SNPs (if they were not already on the genotyping array), HLA alleles (HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, and HLA-DRB1), and known amino acid polymorphisms within those HLA genes. These imputed variants can then be analyzed in the same way as assayed genotypes.
Hence, in addition to using HIBAG, we employed SNP2HLA v1.0.2 [35] to impute 8,961 HLA SNPs, eight HLA alleles, and amino acid polymorphisms, based on the T1DGC reference panel, in the European GWA, Immunochip, and North American dataset. The non-SNP imputed markers were coded as present/absent. Quality control for the combined SNP + imputed marker data included: (1) removal of imputed SNPs that were already assayed on the array; (2) within each dataset (UK2, NL, Finn, IT), removal of SNPs/markers with MAF <1 %, missingness >10 %, deviation from Hardy-Weinberg equilibrium (HWE) in controls P <5 × 10 −6 , differential case/control missingness P <10 −3 , and removal of individuals with >10 % missingness; (3) removal of SNPs/markers that were not present in the four European datasets (UK2, NL, Finn, IT); (4) removal of SNPs/markers with differential case/control missingness P <10 −3 across the combined data; and (5) removal of SNPs/markers not on the North American imputed dataset. For the Immunochip data, QC included: (1) removal of imputed SNPs already assayed; (2) SNP/marker filtering by MAF <0.5 %, missingness >10 %, deviation from HWE in controls P <10 −3 , and (3) differential case/ control missingness P <10 −3 , and removal of individuals with missingness >10 %. We verified that the imputed markers included in genomic risk scores had high imputation accuracy (r 2 > 0.8) in the training data. The final SNPs + imputed marker European data consisted of 507,321 markers (500,821 assayed SNPs and 6,500 imputed markers) over 11,912 individuals (5,552 of which were DQ2.5+); after removal of individual assayed in the UK GWA data, the Immunochip dataset had 7,803 individuals, of which 4,732 were DQ2.5+, with 24,555 SNPs/ markers (approximately 17,800 assayed SNPs and approximately 6,700 imputed markers) common with the other datasets. For the DQ2.5-specific GRS (GRS-DQ2.5), only SNPs present in the North American dataset were used in cross-validation, so that all SNPs Validation of the published risk score We used the previously published GRS14 risk score (comprising 228 SNPs, available at [44]), which is given in terms of rs IDs, reference alleles, and a weight, to produce a per-individual score (using plink -score). The final score for each individual is the sum of the minor allele dosages of each SNP, weighted by the published weights. Four SNPs in this score were not found in the North American post-QC genotype data and were excluded; these four SNPs had relatively low weight (ranked 59th or lower, out of 228) and thus their absence is unlikely to have substantially affected the predictive power of the final model.

Cross-validation and novel genomic risk scores
We used the tool SparSNP [30], which fits L1/L2-penalized support-vector machine (SVM) models to SNP data, in 10 × 10-fold cross-validation on the European GWA dataset. Briefly, these models are additive in the minor allele dosage {0, 1, 2}, and take into account all SNPs (or other markers) in the data, however, only a proportion of the SNPs/markers receive a non-zero weight, tuned by the L1 penalty (higher penalties lead to fewer SNPs/markers with non-zero weight), together with an L2 (ridge) penalty in the range of 10 −6 to 10 3 . The optimal penalties were determined via cross-validated AUC.
We have previously shown that such penalized models produce superior predictive ability for CD compared with several widely-used alternatives [12]. We evaluated a range of L1/L2-penalized models over a grid of penalties, with the optimal model selected by the best average AUC. For cross-validation, the reported AUC is a LOESS-smoothed average over the 10 × 10 = 100 test sets. For independent validation, we derived a final consensus model consisting of the SNPs selected in >60 % of the replications, with corresponding weights being the average weights over the replications (the final risk scores are given in Additional files 2 and 3). The consensus model was taken and tested without further modification on the North American dataset. Improvement in case/control discrimination (AUC) was tested using Harrell's two-sided test for paired concordance (rcorrp.cens in R package Hmisc) [45], and 95 % confidence intervals for the AUC were computed using DeLong's method (R package pROC) [46].

The ratio of non-CDs incorrectly implicated per CD correctly implicated
We calculated the ratio r of non-CD individuals incorrectly implicated per CD case correctly implicated as r = (1 -PPV)/PPV, where PPV is the positive predictive value, calculated as where sens, spec, and prev are the sensitivity, specificity, and prevalence (as a proportion of the population under consideration), respectively, computed for each possible risk cutoff (forming the ROC curve). This ratio is also the reciprocal of the post-test odds of disease, that is, 1 / (likelihood-ratio × pre-test-odds). In estimating the PPV for the DQ2.5+ subgroup we assumed a prevalence of 10 %, equivalent to the prevalence of individuals with high background risk of CD, such as family history of the disease. To evaluate the difference in the ratios between two risk scores, we used a stratified bootstrap procedure in the test data, whereby B = 10,000 replications were drawn (sampled with replacement), the rank statistics were estimated within each replicate for each risk score separately, and the average over the differences in the ratios was reported as the final bootstrap estimate, with 95 % approximate confidence intervals for the difference derived using the 0.025 and 0.975 quantiles of the bootstrapped differences.

Independent validation of genomic risk scores for CD
We applied the previously published GRS14 to the North American dataset and evaluated its predictive power using receiver-operating characteristic (ROC) curves (Fig. 1). The GRS14 model achieved AUC = 0.831 (95 % CI 0.808-0.854), indicating that the majority of the predictive power of the GRS14 model, previously estimated at AUC = 0.86-0.9 [8] on the Italian, Dutch, and Finnish datasets, was maintained in the North American dataset. For comparison, we also trained MultiBLUP [11] on the same European data and tested it on the North American dataset with identical results (AUC = 0.831, 95 % CI 0.808-0.85; Additional file 1: Figure S2), and in addition employed the same L1/L2-regularized SVMs in cross-validation within the North American data, yielding similar results (maximum average cross-validated AUC = 0.823, Additional file 1: Figure S3). Further, there were no substantial differences in the genomic scores between CD cases diagnoses with different diagnosis methods (Additional file 1: Supplementary Results and Figure S4).

Celiac disease risk prediction within the HLA-DQ2.5+ subgroup
While risk scores that discriminate CD cases from controls in the general population are useful, a more pressing clinical question is whether discrimination is possible within the HLA-DQ2.5+ subgroup of individuals, who are at the highest risk for CD among all HLA+ individuals. It is estimated that approximately 90 % of HLA+ individuals are DQ2.5+, with those that are DQ2.5-homozygous being at greater risk for CD than those that are DQ2.5-heterozygous [26,[47][48][49].
The GRS-DQ2.5 and GRS-DQ2.5-imputed models were evaluated using the average AUC over 10 × 10-fold cross-validation on the DQ2.5+ European GWA samples. For GRS-DQ2.5, the maximum AUC achieved was 0.727 at 2,513 SNPs with non-zero weight together with an L2 penalty of 1. For the GRS-DQ2.5-imputed, the best AUC was 0.74 at 3,317 non-zero weight SNPs/ markers using an L2 penalty of 1 as well (Fig. 2) (for the results of GRS-DQ2.5 using other L2 penalties see Additional file 1: Figure S5).
To externally validate these models, we utilized the North American DQ2.5+ individuals (n = 1,237, 1,094 cases and 143 controls) (Fig. 3a). The highest performance was observed for GRS-DQ2.5-imputed, achieving an AUC of 0.73 (95 % CI 0.687-0. In a clinical setting, it is desirable to maintain high sensitivity, that is, capturing most CD cases while incurring a cost of some false positives (reduced specificity). In considering the corresponding region of the ROC curve with sensitivity >0.9, while both the haplotype risk model and the GRS14 model had overall slightly higher specificity than the zygosity status, the greatest increase in specificity was observed for the GRS-DQ2.5 and GRS-DQ2.5-imputed models (specificity of 0.29 and 0.32, respectively, compared with specificity = 0.15 for the GRS14).
Utility of genomic risk scores in reducing unnecessary follow-up tests in the HLA-DQ2.5+ subgroup In a clinical setting, a reduction in the number of unnecessary tests to screen for a disease or secure a diagnosis is desirable. The utility of a GRS for reducing unnecessary tests can be measured using the ratio of non-CD incorrectly implicated as CD to those CD cases correctly implicated (given that neither the proposed GRS nor HLA typing can act as a sole diagnostic for CD, we use the term 'implicate' as distinct from  . We assumed a CD prevalence of 10 % in the DQ2.5+, corresponding to a baseline implication ratio of 9:1, that is, all DQ2.5+ implicated as having CD at 100 % sensitivity 'diagnosed'). This ratio should further be assessed relative to the sensitivity, as one should seek to minimize the former while maximizing the latter. A lower ratio (ideally <1) at high sensitivity indicates a better ability to avoid falsely implicating non-CD individuals as being at high CD risk, while capturing a substantial number of CD cases. Unlike the sensitivity and specificity, this ratio depends on the true prevalence of CD in the population being tested (here, all DQ2.5+ individuals). People at high risk of CD, for instance, due to a family history of the illness, have a 10 % prevalence of disease [50]. Therefore for this modeling we likewise assumed a prevalence of 10 %, leading to a baseline ratio of 9:1 (equivalent to all DQ2.5-positive individuals being recommended for follow-up testing).
Overall, both the GRS-DQ2.5 and GRS-DQ2.5-imputed models implicated fewer non-CD individuals per CD case correctly implicated than the GRS14, the DQ2.5-zygosity, or the HLA-haplotype-risk score (Fig. 3b). Note that since DQ2.5-zygosity within the DQ2.5+ individuals is limited to heterozygous (one risk allele) and homozygous (two risk alleles), the main point of interest for the curve is the one indicating heterozygotes, leading to approximately 3.5:1 incorrect implications but achieving only 20 % sensitivity. By comparison, both GRS-DQ2.5 and GRS-DQ2.5-imputed models achieved a ratio of approximately 2:1 at the same sensitivity as DQ2.5-zygosity. Similarly, the main point of interest for HLA-haplotyperisk separates high from medium risk, with a sensitivity of 50 % and approximately 4:1 incorrect implications, a level improved upon by GRS14, GRS-DQ2.5, and GRS-DQ2.5imputed.

Discussion
More widespread use of genomic risk prediction in autoimmune disease has been hampered by the inability to identify compelling advantages over existing approaches, mainly HLA haplotyping. Here, we have focused our analysis on individuals carrying the HLA-DQ2.5 heterodimer, which is the most common risk heterodimer and also imparts the highest risk for CD. Existing diagnostic tests are not useful in the absence of active disease and cannot predict risk of future disease. While approaches based on HLA haplotypes, including DQ2.5 zygosity and a HLA haplotype risk score [33], provide some predictive power, we have demonstrated that genomic risk scores focused on DQ2.5+ individuals have substantially higher predictive power than either approach, extending our previous findings [8]. For the DQ2.5+ subset of individuals in particular, our results highlight that while particular HLA alleles are a necessary condition for CD, they are far from a sufficient condition, and that genomic analysis can better uncover further sources of risk not adequately captured by DQ2.5 zygosity alone.
Our genomic risk scores were based on direct modeling of SNPs, both within and outside of HLA. Further, combining the genome-wide SNP data with imputed HLA markers, including HLA alleles outside the wellknown HLA-DQA1 and HLA-DQB1 genes, imputed HLA SNPs, and HLA amino acid substitutions [35], led to an increase in predictive power on the North American dataset, thus suggesting that tools for imputing nontraditional risk factors have an important role in future predictive modeling.
The increased precision of both GRS-DQ2.5 and GRS-DQ2.5-imputed translated to an average increased saving of approximately two unnecessary follow-ups per justified follow-up (a 22-24 % average reduction), compared with the alternative strategy of considering all DQ2.5+ individuals for follow-up testing. Importantly, at this level, GRS-DQ2.5 and GRS-DQ2.5-imputed still captured 90 % of DQ2.5+ CD cases. These results suggest that a GRS specific to HLA-DQ2.5+ individuals can achieve substantial cost savings while incurring only a small loss in sensitivity, relative to implicating all DQ2.5+ individuals for more intensive screening and follow-up. In real-life clinical practice, measures could be put in place to overcome the slight loss of sensitivity or any negative consequences. For instance, combining the GRS with CD serology would enable the high sensitivity of the latter test to minimize the chance of true CD cases being overlooked but still leverage the discriminatory power of the GRS. Patients would still be advised to seek medical review if they developed symptoms suggestive of CD. Further, the top 15-20 % of CD cases in terms of GRS-DQ2.5 were estimated to be detectable at a level of 1 unnecessary follow-up per true CD case, suggesting that relatively high confidence of CD can be conferred upon individuals with the highest GRS-DQ2.5/GRS-DQ2.5-imputed scores, thus potentially only requiring CD serology for the subset of individuals with moderate risk, who cannot be confidently predicted by the GRS to have CD or be healthy.
The ultimate clinical role, utility, and cost-effectiveness of genomic risk scores for CD remain to be determined in prospective clinical studies where genomic profiles are undertaken from the outset. If prospective studies can establish the clinical value of the GRS and link it to CD risk, we envisage that clinicians ordering this test will be provided with a weighted GRS value that links to a scale of CD risk. They would then use this score to complement clinical risk stratification and guide the most suitable advice on long-term follow-up and/or further investigations.
Several potential roles for the GRS can be proposed. First, guiding ongoing care in people at-risk of CD who are also HLA-DQ2.5+. Several recent large prospective studies have reinforced the importance of HLA-DQ2.5 for CD development in childhood, with environmental factors such as age of gluten introduction or breastfeeding having little, if any, impact on the proportion of children who ultimately develop CD [25,26,51]. These findings have prompted some experts to recommend increased surveillance for CD in children with a positive family history of CD who are HLA-DQ2.5, especially if DQ2.5 homozygous. The increased predictive power of our GRS-DQ2.5 beyond the information afforded by DQ2.5 zygosity status alone is likely to provide greater clinical utility for predicting risk in this important subgroup of individuals.
Second, genomic data may also be able to inform CD prognosis, such as the likelihood of complicated (refractory) disease or the natural history of latent CD (positive CD-serology and HLA susceptibility but a normal small bowel), or other aspects of care such as response to the gluten-free diet. Finally, another approach may be to combine a GRS with serology to optimize risk stratification for CD and determine who will benefit most from definitive small bowel biopsy. Such a strategy could leverage the strengths of each test: the high sensitivity for active CD using CD serology and the fine-grained CD risk quantification of the GRS, including its ability to provide predictive information and exclude CD. A major benefit of genetic testing in the diagnostic work-up of CD is that, unlike serology and small bowel histology, accurate results are not dependent on active gluten intake. This is particularly relevant, as the gluten-free diet has been adopted by 10 % or more of the population in many Western countries, rendering traditional tests inaccurate. Establishing the clinical utility of genomic testing in CD will also support the feasibility of a genomics-based platform for a range of other autoimmune diseases, where both HLA and non-HLA genetic contributions are important as well, and which commonly overlap with CD [14].