Fine-mapping studies distinguish genetic risks for childhood- and adult-onset asthma in the HLA region

Genome-wide association studies of asthma have revealed robust associations with variation across the human leukocyte antigen (HLA) complex with independent associations in the HLA class I and class II regions for both childhood-onset asthma (COA) and adult-onset asthma (AOA). However, the specific variants and genes contributing to risk are unknown. We used Bayesian approaches to perform genetic fine-mapping for COA and AOA (n=9432 and 21,556, respectively; n=318,167 shared controls) in White British individuals from the UK Biobank and to perform expression quantitative trait locus (eQTL) fine-mapping in immune (lymphoblastoid cell lines, n=398; peripheral blood mononuclear cells, n=132) and airway (nasal epithelial cells, n=188) cells from ethnically diverse individuals. We also examined putatively causal protein coding variation from protein crystal structures and conducted replication studies in independent multi-ethnic cohorts from the UK Biobank (COA n=1686; AOA n=3666; controls n=56,063). Genetic fine-mapping revealed both shared and distinct causal variation between COA and AOA in the class I region but only distinct causal variation in the class II region. Both gene expression levels and amino acid variation contributed to risk. Our results from eQTL fine-mapping and amino acid visualization suggested that the HLA-DQA1*03:01 allele and variation associated with expression of the nonclassical HLA-DQA2 and HLA-DQB2 genes accounted entirely for the most significant association with AOA in GWAS. Our studies also suggested a potentially prominent role for HLA-C protein coding variation in the class I region in COA. We replicated putatively causal variant associations in a multi-ethnic cohort. We highlight roles for both gene expression and protein coding variation in asthma risk and identified putatively causal variation and genes in the HLA region. A convergence of genomic, transcriptional, and protein coding evidence implicates the HLA-DQA2 and HLA-DQB2 genes and HLA-DQA1*03:01 allele in AOA.

including highly replicated, significant associations at the human leukocyte antigen (HLA) region on chromosome 6p21. Recently, we performed a GWAS for childhoodonset asthma (COA) and adult-onset asthma (AOA) [3] in individuals from the UK Biobank [7]. Each revealed independent, broad regions of association spanning HLA class I (HLA-C/B) and class II (HLA-DR/DQ) genes. While the HLA class II region was the most significant locus for AOA, variants in this region had greater effect sizes for COA compared to AOA [3]. In contrast, associations with variants in the class I region were similar and had similar effect sizes in both.
Overall, the HLA region is the most frequently associated locus with asthma and allergic diseases [8]. Whereas its central role in adaptive immunity has been extensively characterized [9][10][11], determining the causal variants and their putative functions has been particularly challenging due to the remarkably high gene density, extraordinary levels of genetic polymorphism, and striking linkage disequilibrium (LD) that characterize this region [12,13]. These features make identification of causal disease-associated variants and the genes that underlie these associations particularly challenging. As a result, the specific HLA region variants and genes contributing to asthma risk are not known.
In this study, we hypothesized that the causal variants at the HLA locus include those that are both shared and distinct to COA and AOA and that some causal variants exert their effects on asthma risk by modifying the expression of HLA genes while others alter protein properties by changing amino acid sequences in functional domains. Using Bayesian approaches for fine-mapping GWAS loci (genetic fine-mapping) and for fine-mapping expression quantitative trait loci (eQTLs; expression finemapping) of HLA region genes in cell types relevant to asthma, we identified putatively causal COA and AOA variants and genes in the HLA class I and class II regions and replicated a subset of causal variants in an ethnically diverse sample.

Study subjects in the discovery and replication samples
We examined COA and AOA HLA loci in a discovery cohort comprised of the same adult individuals from the UK Biobank data release July 19, 2017, and using the same inclusion/exclusion criteria and phenotype definitions reported in Pividori et al. [3]. Briefly, we filtered out individuals with poor-quality genotypes, ambiguous sex assignments, and related individuals [3]. Subjects included in the discovery analysis were restricted to White British ancestry (UK Biobank Data-Field 22006). COA cases were subjects with self-reported doctordiagnosed asthma before 12 years of age (n=9432), AOA cases were subjects with self-reported doctor-diagnosed asthma between 26-65 years of age (n=21,556), and controls were subjects with no reported asthma up to the latest age of study participation (n=318,167). Individuals with chronic obstructive pulmonary disease, chronic bronchitis, and emphysema were excluded from the AOA and control groups.
The replication cohort consisted of individuals from the UK Biobank who were not in the White British ancestry set. We used the same definitions as above for COA (n=1686), AOA (n=3666), and non-asthmatic controls in the replication cohort (n=56,063). The self-reported ancestry for the replication cohort is 70.7% White, 12.4% Black or Black British, and 16.8% Asian or Asian British (Additional file 1: Fig. S1).

Genotypes and HLA alleles
Allele dosages were extracted for genotyped and imputed SNPs from UK Biobank v3 within the boundaries of the asthma HLA loci in the COA and AOA GWAS as defined by Pividori et al. [3] using the rbgen 0.1 package in R 3.6.1 [14]. All SNPs that passed the following genotype quality control filters: call rate > 95% or information score > 80%, Hardy-Weinberg equilibrium test p-value > 1 × 10 -10 , and minor allele frequency (MAF) > 0.1% were included, as previously described [3]. A total of 8624 and 10,006 SNPs passed the genotype filters at the HLA class I and class II locus, respectively. Four-digit resolution (also known as two-field resolution) of classical HLA allele dosages were imputed from SNP data by the UK Biobank using HLA*IMP:02 [7,15]. We excluded alleles with low frequencies (<1%, specific to each self-reported race). We translated the imputed HLA allele dosages to their corresponding amino acid dosages using publicly available data from SNP2HLA [16] (http:// softw are. broad insti tute. org/ mpg/ snp2h la/) that map HLA alleles to their amino acid sequences. All amino acid polymorphisms were encoded as biallelic, as previously described [16][17][18]. Amino acid polymorphisms with low frequencies (<1%) were excluded. Base-pair positions for variants, genes, and other genomic features are based on Human Genome Assembly hg19 [19,20]. We use the term "HLA allele" when referring to the four-digit nomenclature, "amino acid polymorphism" when the amino acid is the target of analysis, and "SNP" when the SNP is the target of analysis. We note that SNPs determine amino acid polymorphisms and amino acid polymorphisms determine HLA alleles.

Variant associations
Associations between variants (SNPs, HLA alleles, and amino acid polymorphisms) and COA and AOA were tested using logistic regression. Sex and the first 10 ancestral principal components (PCs) were included as covariates in the analyses in White British subjects. We used the test for heterogeneity from METAL [21] to determine if variant effects were shared between COA and AOA. We focused on significantly associated variants and amino acid polymorphisms and applied a Bonferroni correction to the results from the heterogeneity test.
To determine whether the associations with variants of interest were driven in part by allergy, we extracted allergy status (any eczema, hay fever, and/or food allergy) and performed regressions where we (1) included allergy status as a covariate in the regression, (2) used an interaction term between allergy status and the variant of interest, and (3) examined associations excluding all individuals with allergy. To determine whether there were any sex differences in HLA-associated risks, we (1) performed associations for the candidate variants separately in males and females and (2) include an interaction term between sex and the variant in the regression model.

Fine-mapping the HLA region
Sum of Single Effects (SuSiE) [22] (susieR R package version 0.9.0) was used to fine-map the asthma-associated HLA loci and determine putatively causal variants for COA and AOA, separately in the class I and class II regions. SuSiE is a Bayesian analog of stepwise conditional analysis that improves on previous methods by taking account of the uncertainty of the selection of associated SNPs. Additionally, SuSiE can handle individual-level data (allowing us to examine SNPs, HLA alleles, and amino acid polymorphisms), detect multiple causal signals, and identify causes when the variable with the lowest p-value is not the causal variable [22]. SuSiE reports credible sets, which are sets of variants that include at least one causal variant with high probability. To assess whether SNPs, amino acid polymorphisms, and/or HLA alleles were causal for asthma risk, genotype dosages for these three types of polymorphisms were included in a single genotype matrix: class I SNPs, HLA alleles, and HLA amino acid polymorphisms were examined together for the fine-mapping studies in the class I region; class II SNPs, HLA alleles, and HLA amino acid polymorphisms were examined together for the class II region. See Additional file 1: Supplementary Methods for more details.

Gene expression and eQTL studies
To test whether SNPs identified in the fine-mapping results using SuSiE are eQTLs in asthma-relevant cell types, RNA-seq data from three cell sources from ethnically diverse individuals was evaluated (Additional file 1: Table S1). Lymphoblastoid cell lines (LCLs), peripheral blood mononuclear cells (PBMCs), and upper airway (nasal) epithelial cells (NECs) were considered as surrogates for immune, lung, and epithelial tissues that were implicated as relevant cells/tissues in asthma risk in the COA and AOA GWAS [3]. The LCLs were derived from blood previously collected from 398 Hutterites [23], a founder population of European descent. Unstimulated PBMCs were derived from blood (n=132) [24] and NECs from nasal swabs (n=188) [25], which were previously collected from children from the URban Environment and Childhood Asthma (URECA) birth cohort [26]. In both studies, written informed consent was obtained from the parents of children, and assent from children age 6 or older. Both studies were approved by institutional review boards, and all study subjects were assigned randomly generated ID codes.
RNA-sequencing reads from these studies were remapped. For the polymorphic HLA genes, we aligned RNAseq reads to reference sequences from the International ImMunoGeneTics (IMGT) database [27] for each individual's known HLA type [28] (see Additional file 1: Supplementary Methods for more details). For eQTL mapping in the PBMCs and NECs, we performed linear regressions with QTLtools [29] using a nominal pass and cis-window size of ±1 Mb from the transcription start site (TSS). We performed eQTL mapping in the LCLs using Genomewide Efficient Mixed Model Association (GEMMA) [30], including a kinship matrix as a random effect to account for relatedness between the Hutterites. Relevant covariates were included for all tests (see Additional file 1: Supplementary Methods for additional details). In the LCLs, some SNPs in the AOA or COA credible sets had high missingness. In these instances, we ran the eQTL mapping in the subset of individuals with high-quality genotypes. In the PBMCs and NECs, some SNPs in the AOA or COA credible sets failed QC in genotypes extracted from whole genome sequence data. In those instances, we extracted high-quality genotypes from the Illumina MEGA array in the subset of individuals that had previous array-based genotyping and used those genotypes in the eQTL studies.

Fine-mapping eQTLs
We performed eQTL fine-mapping in three asthma-relevant cell types, LCLs, PBMCs, and NECs, using SuSiE for the expression of genes in which SNPs in any of the COA or AOA credible sets were significantly associated with their expression at a false discovery rate (FDR) [31] threshold of 0.05. The same covariates in each of the eQTL studies were regressed from the dataset (see Additional file 1: Supplementary Methods).
SuSiE was used to fine-map eQTLs using a window of ± 1 Mb around the TSS of each gene. SNPs in any of the eQTL credible sets were then compared to the SNPs in the COA and AOA GWAS credible sets to assess overlap.

Replication of fine-mapping results
Because variants within a credible set are highly correlated, we selected candidate variants (tag SNPs) from each credible set for replication. These included rs2428494 (the shared class I COA and AOA credible set 1 [CS1] SNP), HLA-C p.11 (class I COA credible set 2 [CS2] amino acid polymorphism), rs28407950 (class II COA CS1 lead SNP), rs35571244 (lead class II COA CS2 SNP), rs9272346, rs1063355, rs3828789, and rs9274660 (class II AOA CS1 SNPs that were also in the eQTL CSs and overlapped with LCL enhancer marks), and HLA-DQA1*03:01 (class II AOA CS2 HLA allele). We performed logistic regressions for COA and AOA separately in self-reported White (excluded from White British Ancestry cohort), Asian British, and Black British individuals from the UK Biobank and then performed a meta-analysis as implemented in METAL [21]. Logistic regressions were performed using sex and the top 20 ancestry PCs as covariates in a one-sided test that required the direction of effect to be the same as in the discovery sample.
After filtering out low-frequency amino acid polymorphisms (<1%), we tested the 741 amino acid polymorphisms at the six HLA loci for associations with COA and AOA. Of these, 188 amino acid polymorphisms were associated with COA and 152 were associated with AOA (p<5 × 10 −8 , Additional file 2: Table S5). P-values were overall more significant, and estimated ORs were larger for class II compared to class I HLA alleles and amino acid polymorphisms (Additional file 1: Fig. S3). The magnitude of the ORs was also generally larger for COA compared to AOA. Both of these observations are consistent with the GWAS results [3]. Using a test for heterogeneity, eight HLA-DQA1, seven HLA-DQB1, and 11 HLA-DRB1 amino acid polymorphisms were significant, suggesting age of onset-specific effects (Additional file 2: Table S4).

Fine-mapping the HLA class I region
To fully capture genetic variation at the HLA region, we combined the genotypes for 19,499 HLA alleles, amino acid polymorphisms and SNPs, and performed genetic fine-mapping on the combined set of variants separately in the class I and class II regions, with the goal of identifying causal variation for COA and AOA. We used the Bayesian regression method SuSiE [22] to perform fine-mapping of the 9021 combined variants at the class I locus using the locus boundaries defined previously [3]. Fine-mapping the COA class I locus identified two credible sets, indicating the presence of two independent associations with COA in the class I region. Credible set 1 (CS1) consisted of a single variant and CS2 consisted of two variants. The CS1 SNP (rs2428494) (red point, Fig. 1A) is in an intron of HLA-B, and its posterior inclusion probability (PIP) is 0.97. This was the lead SNP in both the COA and AOA GWASs at the HLA class I locus [3]. In CS2 (blue points, Fig. 1A), the probability was nearly equally divided between two highly correlated variants (LD r 2 = 0.99, calculated from our data), with PIP values of 0.43 and 0.57 for rs28481932 and HLA-C p.11, respectively. SNP rs28481932 is upstream of HLA-C, and HLA-C p.11 is an amino acid polymorphism in HLA-C (p.11 Ala/Ser). The risk amino acid (alanine) is on HLA-C*02, *03, *05, *06, *07, *08, *12, *15, *16, *17, and *18 alleles in this sample (including rare alleles), and the amino acid (serine) that is associated with decreased risk of asthma is on HLA-C*01:02, *04:01, *04:07, *14:02, and *14:03 alleles. The HLA class I locus in AOA included one credible set with rs2428494 having a PIP of 1.00 (Fig. 1A). Therefore, in the class I region, rs2428494 is a shared causal SNP for COA and AOA and HLA-C p.11 or rs28481932 is a causal variant for COA. Alternatively, one or both may tag untyped or rare causal variation in LD with the candidate variants identified by SuSiE.
The odds ratios (ORs) remained largely unchanged when accounting for allergy in the associations, and no sex differences in risk were observed for any of these variants, suggesting that the associations were not reflecting confounding due to the presence of allergic diseases in either COA or AOA (Additional file 1: Table S6, S7).

Fine-mapping the HLA class II region
We used 10,428 combined variants at the class II region for fine-mapping. Two credible sets were identified for COA (Fig. 1B). CS1 (orange point) included one SNP (rs28407950; PIP 1.00) located downstream of HLA-DQB1. This SNP was the lead GWAS SNP in the HLA class II region for COA [3]. CS2 (cyan points) contained five SNPs spanning 152 kb. The SNP (rs35571244) with the highest PIP (0.50) was located at the proximal end of the class II region upstream of TAP1. The minimum r 2 between all variants in CS2 was 0.79 (median r 2 =0.99).
No HLA alleles or amino acid polymorphisms were included in either of the COA credible sets.

Fine-mapping simulations in the HLA region
Most previous fine-mapping studies excluded the HLA region due to its genomic complexities [38][39][40]. To validate that SuSiE accurately detects multiple independent causal signals in this region, we conducted simulations in each of the HLA class I and class II regions for both binary (e.g. case-control status) and quantitative (e.g. gene expression) outcomes (see Additional file 1: Supplementary Methods for additional details). In the null simulations (with zero causal variants), SuSiE accurately reported no credible sets (Additional file 1: Fig. S4). In the simulations ranging between one to three causal variants, SuSiE correctly detected the correct number of causal signals, each containing the true causal variant in the class I and class II regions for both the binary and quantitative outcomes. The true causal variant had the highest PIP 61% of the time. These simulations demonstrated that fine-mapping studies in the HLA region with SuSiE can accurately identify credible sets containing the true causal variants despite the complexities of this region.

Fine-mapping eQTLs and functional annotations in the HLA region
Genetic variation can influence disease risk by altering protein function or by influencing expression levels of disease-associated genes. Earlier studies demonstrated different functional properties of HLA alleles defined by amino acid polymorphisms [9,41], while more recent studies have shown regulatory effects of SNPs on HLA gene expression [42][43][44][45][46]. Our results suggested that both types of mechanisms may mediate the effects of HLA genes on asthma risk. We first assessed the potential regulatory effects of asthma-associated SNPs using gene expression data from three asthma-relevant cell types (Additional file 1: Table S1) from individuals with HLA types that were known or could be determined. This allowed us to map the RNA-sequencing (RNA-seq) data against sequences corresponding to each person's known HLA type (Additional file 2: Table S9) and avoid mapping biases that arise from the large number of sequence differences between HLA alleles and reference genome (28).
We first performed eQTL studies for all genes whose transcription start site (TSS) was Tables S10, S11). We then used SuSiE to perform fine-mapping for each gene with at least one eQTL in that cell type, resulting in expression fine-mapping of 15 genes in LCLs, three in PBMCs, and six in NECs (Additional file 1: Table S12). Five of the eQTL fine-mapping studies identified credible sets with SNPs that were also in the class II AOA CS1 (Additional file 2: Table S13). These included credible sets with eQTLs for HLA-DQB2 in LCLs, PBMCs, and NECs ( Fig. 2A) and HLA-DQA2 in LCLs and PBMCs (Fig. 2B). The AOA CS1 risk alleles were associated with increased expression of both HLA-DQA2 and HLA-DQB2 in these cells (Fig. 2C, Additional file 1: Fig. S5). None of the other SNPs in eQTL finemapping credible sets overlapped with SNPs in the class II AOA CS2, the class II COA CS1 or CS2, or the AOA or COA class I credible sets.
To prioritize among the 20 SNPs that were in both the class II AOA CS1 and eQTL credible sets for HLA-DQA2 or HLA-DQB2, we overlapped these SNPs with functional annotations from nine cell lines (GM12878 (LCLs), H1-hESC, K562, HepG2, HUVEC, HMEC, HSMM, NHEK, NHLF) available from ENCODE [48]. Four of the 20 SNPs overlapped an enhancer region in LCLs (Fig. 2D) and also resided in or near (approximately 70 to 700 bp) weak enhancer marks in three epithelial cell-derived lines (NHEK, keratinocytes; HMEC, mammary epithelial cells; HEPG2; liver hepatocellular cancer) (Additional file 1: Fig. S6). None of the SNPs overlapped with or were near enhancer marks in the other ENCODE cell lines. rs9272346, located upstream of HLA-DQA1, was in four of five of the eQTL credible sets. The other three SNPs were located near or within HLA-DQB1, with rs1063355, rs3828789, and rs9274660 in one, three, and three of the five eQTL credible sets, respectively. Surprisingly, the AOA GWAS lead SNP (rs17843580), which had the highest PIP in the AOA CS1, and the remaining 15 SNPs in CS1 did not overlap an enhancer region in any of the cell types (Additional file 1: Fig. S6).

Structural visualization of amino acid variants
HLA class I and class II molecules bind and present peptides to T cell receptors (TCRs). The polymorphic features of HLA class I and class II molecules, particularly in the peptide-binding domain, serve the crucial functions of diversifying antigen presentation and restricting pathogenevasion of immune recognition. Therefore, we next explored the possibility that the amino acid variants with the highest PIPs in some of the credible sets affect peptide presentation or interactions with the TCR (Additional file 1: Table S14). The amino acid in the class I region with the highest PIP in the COA CS2, HLA-C p.11, is located within the peptidebinding pocket of the HLA-C protein (Fig. 3A). The serine allele was associated with protection from COA and is polar and uncharged; the alanine was associated with risk and is aliphatic and hydrophobic. Thus, both the location and structure of this polymorphic site suggest that peptide presentation by or other functional properties of HLA-C may be different between these alleles. The amino acid with the highest PIP in the AOA class II CS1 was at position 55 in the HLA-DQB1 protein (Fig. 3B). Arginine contains a positively charged side chain and was associated with protection from AOA (p=4.50 × 10 −49 ; OR=0.86, CI 0.84-0.88). This is a multiallelic position, with leucine and proline as alternate alleles; both were associated with asthma risk (Leucine: p=3.36 × 10 −6 ; OR=1.06, CI 1.03-1.08; Proline: p=1.83 × 10 −28 ; OR=1.12, CI 1.10-1.15). The risk variants were both hydrophobic whereas the protective variant was positively charged and polar. This location may be in a region that interacts with the TCR [49]. Because this variant is in strong LD with SNPs that are eQTLs in the class II AOA CS1 (median r 2 =0.99), it is unclear if this variant, the eQTLs, or both are causal for AOA.
Among the 10 amino acids in class II AOA CS2, HLA-DQA1 Ser26, Gln47, Arg56, and Val76 were in perfect LD with each other, had the highest PIPs of the amino acid polymorphisms in CS2, and were associated with AOA risk. These amino acids are present exclusively on the HLA-DQA1*03:01, 03:02, and 03:03 alleles (Additional file 2: Table S15). The HLA-DQA1*03:02 and HLA-DQA1*03:03 alleles occurred at low frequency (<1%) in this sample and were not included in the GWAS or finemapping studies. Of these four amino acids, Ser26 is in the peptide-binding pocket and Val76 is in a region that may interact with the TCR (Fig. 3C). The other amino acid polymorphisms were not in regions with obvious functional effects. At position 26, both the risk-associated serine and the protection-associated threonine have polar uncharged side chains, although the serine sidechain is smaller. Position 76 was multiallelic, with all three amino acids (valine, leucine, and methionine) Fig. 3 Localization of asthma-associated amino acid variants. Ribbon-figure representations of the peptide-binding pocket are shown for each HLA protein, and the amino acid variant in focus is highlighted. A HLA-C p.11, shown in blue, is located within the peptide-binding pocket of the HLA-C molecule [33] (forest green). B HLA-DQB1 p.55, shown in magenta, lies in the region that may interact with the TCR on the HLA-DQB1 [34] protein (gray) in complex with HLA-DQA1 (blue). C HLA-DQA1 p.26, p.47, p.56, and p.76 (green) shown on the HLA-DQA1 protein [35] (blue). p.26 lies in the peptide-binding pocket, p.76 in the region that may interact with the TCR, and p.56 and p.47 in regions outside of the peptide-binding pocket having hydrophobic side chains, and valine having the smallest molecular weight. Position 187 was also perfectly correlated with these amino acids and captured in CS2 but was not part of the crystal structure. Our data support HLA-DQA1*03 alleles as risk alleles for AOA.

Conditional analyses to assess independent effects
To determine whether the candidate eQTL SNP for HLA-DQA2 and HLA-DQB2 (rs9272346) in CS1 and the HLA-DQA1*03:01 allele in CS2 accounts for all the association signal at the class II AOA locus, we performed three conditional analyses in which we included the number of alleles for rs9272346, for HLA-DQA1*03:01, and for both as covariates in association tests of variants at this locus with AOA (Fig. 4A). In the first conditional analysis including number of rs9272346 (CS1) alleles as a covariate, the AOA association signal for HLA-DQA1*03:01 remained genome-wide significant; in the second conditional analysis including the number of HLA-DQA1*03:01 alleles (CS2) as a covariate, the AOA association signal for rs9272346 also remained genome-wide significant. When both rs9272346 and HLA-DQA1*03:01 were included as covariates, the significance across the locus was reduced. These analyses indicate that the most significant asthma locus in AOA is due to variation in two credible sets, whose effects are likely attributable to their impact on expression of the HLA-DQA2 and HLA-DQB2 genes and of the HLA-DQA1*03:01 allele.
To confirm that the two class II AOA putatively causal variants do not contribute to class II COA risk, we repeated the analysis above for SNPs at the COA class II GWAS locus (Fig. 4B). When conditioning on either the AOA CS1 SNP rs9272346, the AOA CS2 HLA-DQA1*03:01 allele, or both, the COA associations remain genome-wide significant, although the magnitudes of the associations are reduced, likely due to including additional covariates in the model and the LD in the region (Additional file 1: Table S16). These results further  Fig. 1 for comparison support the argument that risk for COA and AOA are due to different causal variants in the HLA class II region.
Finally, we performed conditional analyses to assess the independence between the class I and class II signals. For each of the COA class I variants, we tested for their association with COA after conditioning on the tag class II SNPs and vice versa for the class II region. We similarly did this for AOA. For all results, the odds ratios (ORs) are largely similar and the 95% confidence intervals (CIs) overlap between the marginal and conditional associations, suggesting that the class I and class II signals are indeed independent (Additional file 1: Table S17).

Replication of fine-mapping results
To replicate the COA and AOA putatively causal SNPs identified in the UK Biobank White British ancestry individuals, we examined a replication cohort of UK Biobank multi-ethnic individuals who were initially excluded from our studies. Asthma and allergy phenotypes were defined using the same criteria as in the discovery sample. The prevalence of both were similar in the discovery and replication samples (Additional file 1: Table S18, S19). To allow for allele frequency and effect size heterogeneity between the replication cohorts (n=43,449 White British, n=10,327 Asian or Asian British, n=7637 Black or Black British), we tested each variant for association with COA and AOA within each cohort and then performed a meta-analysis of the results. We required that the same allele is associated with asthma with the same direction of effect as in the discovery cohort. All of the variants, except the class II COA CS2 and the class I AOA CS1 SNPs, replicated at a significance threshold adjusted for multiple testing of 5.0 × 10 −3 ( Table 2, Additional file 1: Fig. S7, Additional file 2: Table S20). Additionally, the HLA-DQA1*03:01 allele had the most significant association for AOA compared to the other HLA alleles tested.
Overall, all but two of the candidate variants (AOA rs2428494 and COA rs35571244) from the discovery cohort were significantly associated with COA or AOA in the replication cohort with the same direction of effect. Both variants were nominally associated with COA or AOA, with the same directions of effect, but were not significant after multiple test correction.

Discussion
The HLA region is associated with more diseases than any other region of the genome [50], and variation in this region has been consistently associated with asthma risk in GWASs [2][3][4][5][51][52][53]. However, the causal variants and genes have been unknown. Most previous large studies focused only on SNPs, which do not fully capture the extensive protein polymorphism at this locus. A recent study reported colocalizations between eQTLs for HLA-B, HLA-DQB1, HLA-DQA1, HLA-DRA, TAP1, and RNF5 in induced pluripotent stem cells with asthma GWAS SNPs [45]. However, they did not separate COA and AOA, examine associations with HLA alleles or amino acids, or study gene expression in asthma-relevant cell types. Our study addressed these limitations and used relevant cell types and eQTL fine-mapping to identify putatively causal eQTLs. These studies extended our earlier observations of COA and AOA having both shared and distinct genetic risk to the HLA region. We also prioritized putatively causal variation by examining the location of associated amino acid variants within the functional domains of HLA proteins and of associated SNPs to functional annotations of gene regulation. In the class I region, we identified both a COA-specific association that may be mediated by HLA-C protein coding Table 2 Results of replication meta-analysis P-values, odds ratios (ORs), and 95% confidence intervals (95% CI) in the replication cohorts are reported for each of the candidate variants from the discovery COA and AOA credible sets variation and a shared causal variant with unknown function. We did not find any shared causal variation between COA and AOA in the class II region, which was supported by replication studies. Nevertheless, our data strongly suggest that HLA class II-associated risk for AOA is mediated by both protein coding variation associated with the HLA-DQA1*03 alleles and differential expression of the nonclassical HLA-DQA2 and HLA-DQB2 genes.
Our fine-mapping studies in White British individuals from the UK Biobank revealed a lead GWAS SNP in an intron of HLA-B in the class I region that was putatively causal for both COA and AOA (rs2428494). However, rs2428494 was not in any eQTL credible sets. This SNP was also not an eQTL for any genes in GTEx tissues [54] or in immune cells in the Database of Immune Cell Expression, eQTLs, and Epigenomics (DICE) [55] and did not reside in ENCODE [48] cis regulatory elements (Additional file 1: Fig. S6). However, rs2428494 was predicted to be in flanking active TSSs for several T cell subsets in Roadmap [56]. Additionally, this association was replicated in a multi-ethnic cohort and was the lead class I region SNP in a meta-analysis of GWASs for allergic rhinitis, a common co-morbidity with asthma, with the same allele associated with risk [17]. Otherwise, little is known about this variant.
A second credible set in the class I region included a SNP (rs28481932) and an HLA-C amino acid polymorphism at position 11 and was specific to COA. We did not accrue any functional evidence for the SNP to mediate its effects through gene expression based on our eQTL studies and colocalizations with ENCODE annotations (Additional file 1: Fig. S6). However, the HLA-C amino acid polymorphism had a higher PIP and its location in the HLA-C protein makes it a promising functional candidate. This amino acid polymorphism was also associated with COA in the replication cohort. Along with other class I classical HLA genes, HLA-C is expressed on the cell surface of nearly all nucleated cells, where it presents intracellular peptides to the TCR of CD8 T cells. TCR recognition of foreign (e.g. viral) peptides presented by class I molecules activates these T cells, leading to the cytotoxic killing of infected cells by CD8 T cells and promoting inflammation. Position 11 lies in a β-sheet in the peptide-binding pocket, and the amino acid substitution may change the peptide-binding properties and alter antigen presentation and recognition by CD8 T cells. HLA-C is also a ligand for killer immunoglobulin receptors (KIRs) expressed on natural killer (NK) cells [57], which survey MHC class I levels on cell surfaces and can induce cell death when levels decrease during cell stress or viral infection [58]. Modulation of NK cell activity may be another potential role for this variant in COA [59,60].
In the class II region, two credible sets were identified for COA, neither of which overlapped with the class II AOA credible sets. One of the credible sets (CS1, rs28407950) was replicated in the multi-ethnic cohort providing strong evidence that this SNP, or an untyped or rare variant in LD with it, is indeed a causal variant for COA. Additionally, we observed a nominally significant interaction between sex and rs28407950, with the risk variant having a stronger effect in females compared to males. Similarly, none of the class II COA CS2 SNPs were in eQTL credible sets and therefore not likely causal for resting gene expression in these cells. The fact that no HLA alleles or amino acids were included in either COA credible set largely rules out protein variation mediating risk at this locus for COA.
In contrast, fine-mapping studies identified the same causal variation underlying both AOA risk and expression of HLA-DQA2 and HLA-DQB2 genes at the class II region; these associations with AOA were replicated by fine-mapping studies in the multi-population cohort. While previous asthma GWAS have implicated the classical and highly polymorphic HLA-DQA1 and HLA-DQB1 class II genes [51,53,61], our study revealed associations with HLA-DQA2 and HLA-DQB2 genes in risk for AOA. The HLA-DQA2 and HLA-DQB2 genes are highly conserved paralogues of the HLA-DQA1 and HLA-DQB1 genes [62,63], respectively, but are virtually devoid of amino acid polymorphisms in the peptidebinding pocket [62]. Little is known about the functions of the HLA-DQα2/HLA-DQβ2 protein, although it has been shown to form heterodimers in Langerhans cells and can present antigens and activate T cells [64]. It is notable that SNPs in the AOA CS1 were associated with increased HLA-DQB2 and HLA-DQA2 expression in different cell types, pointing to their potentially broad effects in both immune and airway epithelial tissues. The finding that asthma-associated SNPs in the class II region were associated with increased HLA-DQA2 and HLA-DQB2 expression is consistent with our earlier results showing that predicted increased expression of HLA-DQA2 and HLA-DQB2 [65] was among the most significant gene-based associations with asthma risk [3]. Our current study confirms this prediction and further implicates increased expression of these highly conserved and poorly characterized genes as potential mediators of risk for AOA. Moreover, a recent GWAS of asthma hospitalizations in White British adults from the UK Biobank also reported that the HLA class II region was the most significant association with hospitalizations [66]. Some of the associated SNPs were also eQTLs for HLA genes, including HLA-DQA2. This raises the possibility that the AOA class II CS1 variants in our study may also be associated with asthma hospitalizations and severity due to their effects on HLA-DQA2 expression.
The HLA-DQA1*03 alleles may also play an important role in AOA risk. The class II AOA CS2 contained the HLA-DQA1*03:01 allele and five amino acids that define the HLA-DQA1*03 alleles; some of these polymorphisms are in regions with potential impact on peptide presentation and TCR interactions (Fig. 3C). We suggest therefore that HLA-DQA1*03:01 may be the causal variant for AOA in CS2. T cell activation and proliferation is driven by both differential expression levels and protein coding variation in the HLA genes which may affect binding affinities to different peptides [67,68]. The HLA-DQA1*03:01 allele association was replicated, and it was the strongest HLA allele association in the replication cohort. Taken together, our studies indicate that increased expression of the HLA-DQA2 and HLA-DQB2 genes and coding variation in the HLA-DQA1*03 protein mediate the risks conferred by variation for AOA at the most significant GWAS locus [3].
Despite our delineation of specific HLA class I and class II variants and genes associated with risk for COA and AOA, our study had limitations. First, the fine-mapping method we used (SuSiE) has not yet been extended for logistic models or previously used in the HLA region. However, the use of linear methods to binary data can be justified (see Supplementary Methods) [22,[69][70][71], and our simulations indicate that SuSiE can accurately detect causal signals for binary traits (e.g. case-control status) as well as for quantitative traits (e.g. gene expression) in the genetically complex HLA region, at least in large sample sizes. Second, we did not identify potential mechanisms for all putatively causal signals for COA and AOA. In those cases, these variants may be eQTLs in other cell types not profiled in this study [72], in response to specific stimuli [73], or at specific developmental stages [74]. Further experimental evidence is needed to elucidate the relationships between the putatively causal variants identified in this study and their impacts on gene function and/or expression, and ultimately on risk for asthma. Third, not all of the variants in the credible sets replicated, which may be due to smaller sample size or may reflect true ethnic differences in risk alleles. The extensive diversity of HLA genes and haplotypes between worldwide populations and the known effects of local selection pressures on these genes makes multiancestry replication particularly challenging [75,76], and large cohorts are needed to replicate the fine-mapping results. However, the single-variant association tests replicated nearly all representative variants in COA and AOA for class I and class II credible sets and the direction of effect was the same across discovery and replication cohorts. Given the importance of the HLA locus in asthma, additional studies need to be performed in larger cohorts from other racial and ethnic groups to gain a full picture of the roles of HLA genes in asthma. Finally, we discovered that HLA-DQA1*03:01 and a set of amino acids that are coinherited on the HLA-DQA1*03 alleles were putatively causal for AOA. However, because the allele frequencies of the other HLA-DQA1*03 alleles (*03:02 and *03:03) were too infrequent in this sample to examine individually, we cannot determine if the effect is due solely to the HLA-DQA1*03:01 allele or a general effect of all *03 alleles.
In addition to age of asthma onset, many other known epidemiological factors distinguish asthma with onset in childhood compared to onset after puberty, including sex ratios, the importance of respiratory viral infections, and comorbidities with allergic diseases or obesity, as examples [77]. Future studies can determine whether these well-established differences in COA and AOA may also be explained, at least in part, by differences in causal variation in the HLA region identified in this study.

Conclusions
Overall, our study highlights roles for both expression and protein coding variation in asthma risk. We suggest a prominent role for HLA-C protein coding variation in COA and both gene expression levels and protein coding changes in the HLA-DQ genes in AOA. We further propose that the HLA-DQA1*03:01 allele and SNPs that regulate the expression of the under-characterized HLA-DQA2 and HLA-DQB2 genes explain the class II HLA risk at the most significant AOA locus. Our study identified potential therapeutic targets for asthma and utilized a strategy that can serve as a model for fine-mapping other HLA-associated diseases by integrating approaches to narrow putatively causal variants and genes in the HLA region. the HLA Region. Fig. S5. Expression of HLA-DQB2 and HLA-DQA2. Fig.  S6. ENCODE ChromHMM Results for SNPs in the Childhood-Onset and Adult-Onset Credible Sets. Fig. S7. Replication Results. Table S1. RNA-seq Sample Composition.