Genetic and functional interaction network analysis reveals global enrichment of regulatory T cell genes influencing basal cell carcinoma susceptibility

Background Basal cell carcinoma (BCC) of the skin is the most common form of human cancer, with more than 90% of tumours presenting with clear genetic activation of the Hedgehog pathway. However, polygenic risk factors affecting mechanisms such as DNA repair and cell cycle checkpoints or which modulate the tumour microenvironment or host immune system play significant roles in determining whether genetic mutations culminate in BCC development. We set out to define background genetic factors that play a role in influencing BCC susceptibility via promoting or suppressing the effects of oncogenic drivers of BCC. Methods We performed genome-wide association studies (GWAS) on 17,416 cases and 375,455 controls. We subsequently performed statistical analysis by integrating data from population-based genetic studies of multi-omics data, including blood- and skin-specific expression quantitative trait loci and methylation quantitative trait loci, thereby defining a list of functionally relevant candidate BCC susceptibility genes from our GWAS loci. We also constructed a local GWAS functional interaction network (consisting of GWAS nearest genes) and another functional interaction network, consisting specifically of candidate BCC susceptibility genes. Results A total of 71 GWAS loci and 46 functional candidate BCC susceptibility genes were identified. Increased risk of BCC was associated with the decreased expression of 26 susceptibility genes and increased expression of 20 susceptibility genes. Pathway analysis of the functional candidate gene regulatory network revealed strong enrichment for cell cycle, cell death, and immune regulation processes, with a global enrichment of genes and proteins linked to TReg cell biology. Conclusions Our genome-wide association analyses and functional interaction network analysis reveal an enrichment of risk variants that function in an immunosuppressive regulatory network, likely hindering cancer immune surveillance and effective antitumour immunity. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00827-9.


Background
Basal cell carcinoma (BCC) is the most common form of human cancer, with more than 90% of tumours presenting with genetic activation of the Hedgehog (HH) pathway [1]. The current model of BCC development is that cumulative sun exposure induces characteristic ultraviolet (UV) signature mutations, resulting in DNA damage within basal cells of the skin [2]. Individuals at highest risk of developing BCC are those with fair skin, blonde hair, red hair, and pale coloured eyes [3,4], predominantly due to decreased photoprotection (the absorption of UV photons and reactive oxygen species provided by melanin pigment) [5]. Greater than 99% of BCC cases arise sporadically, without a clear inheritable diseasecausing mutation, highlighting the impact that both environmental factors and the sum of an individuals' genetic variation play in determining whether driver mutations, such as the presence of HH pathway activating mutations, culminate in BCC development. This is most clearly evidenced in the many "BCC-prone" individuals who have no evidence of a monogenic germline predisposition.
Genome-wide association studies (GWAS) have played a key role in identifying the polygenic effects that confer susceptibility to BCC. Loci have been attributed to a wide variety of biological processes including photoprotection, cellular trafficking, cytoskeletal organisation, cell motility/migration, skin biology, ectoderm/ mesoderm differentiation, cell death, telomere biology, immune, tumour progression, DNA repair, and cell cycle regulation [6][7][8][9][10][11][12]. Although GWAS provide a framework for identifying putative susceptibility loci, they rarely identify causal genes, predominantly due to the complicated linkage disequilibrium (LD) structure of the genome, in addition to the fact that genetic variants can affect phenotype via distant regulation of gene expression. To circumvent this problem, several statistical methods have been developed to prioritise functionally relevant genes from GWAS loci [13][14][15][16][17], including the Summary-data-based Mendelian Randomisation (SMR) and HEterogeneity In Dependent Instruments (HEIDI) tests. The SMR and HEIDI methodology [16] combines summary-level GWAS data and expression quantitative trait locus (eQTL) studies to identify whether a transcript and phenotype are associated because of a single and/or set of shared causal variant(s), thereby identifying functionally relevant candidate genes. An emerging area expanding on current methods of GWAS data analyses involves production of network annotations that represent functional interactions among genes and their products. Networkassisted analysis allows advanced analyses of the associated loci and/or candidate genes by assessing the combined effects of multiple genes participating in a network, thereby providing a global view of the genetics underlying a particular human disease or trait.
Here, we describe an integrative analysis of summary statistics from GWAS, eQTL, and methylation quantitative trait locus (mQTL) studies culminating in the construction of two functional interaction (FI) networks underlying BCC susceptibility. We have been able to identify previously reported GWAS hits as functional candidate genes by demonstrating a direct correlation between GWAS SNP association and changes in gene expression. Subsequent network analysis revealed a strong enrichment of immune regulatory genes, revealing genetic susceptibility to BCC is profoundly influenced by inherited background immune traits.

Genome-wide association study
Initial quality control (QC) and imputation of the genotype data on Haplotype Reference Consortium (HRC) [18] panel were carried out by the UK Biobank [19]. We performed further QC (excluding SNPs with minor allele count < 5, Hardy-Weinberg equilibrium test P value < 1 × 10 −6 , missing genotype rate > 5%, or imputation info score < 0.3) using PLINK2 [20]. BCC cases consisted of (1) BCC (UK Biobank data field ID: 1061) from selfreported cancers (UK Biobank data field ID: 20001) and (2) BCC defined by the histology of cancer tumour (UK Biobank data field ID: 40011) within cancer registry records (field ID: 40006). Controls were individuals without any self-reported cancer or cancer registry record. Detailed gender and age demographics of the 17,416 cases and 375,455 controls are represented in Additional file 1: Figure S1. GWAS analysis was performed using BOLT-LMM [21] with fitting gender, age, and first ten principal components (PCs) as the covariates. We included~700,000 SNPs obtained by LD pruning (r 2 < 0.9) from HapMap3 SNPs as "model SNPs" in the BOLT-LMM analysis to adjust for relatedness, population structure, and polygenicity. The beta estimates from BOLT-LMM were transformed on the binary phenotype to the odds ratio (OR) scale by LMOR [22]. The index SNPs are clumped based on P < 5 × 10 −8 , a 1-Mb window, and a LD r 2 threshold of 0.01. The SNP-based heritability was estimated by LD score regression (LDSC) [23]. When estimating the heritability in liability scale, the sample prevalence and population prevalence were both set as 4.43%. Conditional and joint association analysis (COJO) [24] was conducted based on a stepwise selection model to identify a set of jointly associated (and nearindependent) SNPs. Loci were classified as novel if located outside a 1-Mb window of previously reported GWAS hits (GWAS Catalog database [25]).

Summary-data-based Mendelian Randomisation analysis
SMR and HEIDI analyses were conducted as previo u s l y d e s c r i b e d [ 1 6 ] ( S M R s o f t w a r e : h t t p : / / cnsgenomics.com/software/smr/). In brief, the SMR method selects the top eQTL SNP as an instrumental variable to estimate the effect of the gene expression on the trait of interest (in the Mendelian randomisation framework). Selection criteria include cis-eQTL SNPs located within a 2-Mb window of the target gene probe with P eQTL < 5 × 10 −8 . The HEIDI filtering test adopts multiple SNPs in a cis-eQTL region to reject the significant SMR associations due to LD between disease-associated SNPs and eQTL SNPs. The eQTL summary statistics were obtained from the eQTLGen Consortium (n = 31,684 blood samples) and GTEx dataset (GTEx Portal: https://gtexportal.org/ home/index.html; n = 369 whole blood sample, n = 605 sun exposed lower leg, n = 517 sun not exposed suprapubic). The gene expression levels were measured using Illumina gene expression arrays, and the genotype was imputed to 1KGP [26]. The mQTL summary data were generated from a genetic analysis of DNA methylation measured on Illumina Human-Methylation450 arrays (n = 1980 in peripheral blood) [27]. The statistical power of SMR analysis has been demonstrated by simulation in a previous study [28] implementing the SMR workflow used in this study.

Functional interaction networks
Functional interaction networks were constructed using the ReactomeFIViz App (ReactomeFIViz app and Reactome FI Network, Wu and Haw 2017 PMID: 28150241) in Cytoscape (v3.6.1) [29]. GWAS-FI network was constructed using nearest genes to each of the 71 GWAS loci. SMR FI network was constructed using the 46 eQTLGen derived SMR genes. Pathway-enrichment analysis was performed within the ReactomeFIViz app. ReactomeFIViz utilises a comprehensive protein functional interaction network construed from the integration of multiple external data resources including protein-protein interaction networks of several organisms (including human and mouse) in addition to biological pathway databases such as KEGG and Reactome [30]. The information gathered from these resources is served as training data for a Naïve Bayes Classifier, which is ultimately used to predict and annotate functional interaction network for a given gene set [31].

GWAS identifies 3 previously undescribed BCC susceptibility loci
We performed GWAS on 7,288,4213 autosomal SNPs with minor allele frequency (MAF) ≥ 0.01 in 17,416 BCC cases and 375,455 controls from the UK Biobank (UKB) (Fig. 1). The estimated SNP-based heritability is 0.170 Those SNPs with P > 1 × 10 -3 have been omitted. The red line denotes the genome-wide significant threshold of P < 5 × 10 -8 (s.e. = 0.018) on the liability scale, as estimated by LDSC. A total of 71 near-independent SNPs, culminating in 65 loci, significantly associated with BCC (P < 5 × 10 −8 ) (Additional file 2: Table S1), including 3 new loci not yet described, PIK3R1, RHOBTB2, and MYO15A (Additional file 2: Table S1, bold). In order to identify any potential SNPs masked in GWAS due to LD, we performed conditional and joint association analysis (COJO) and identified a total of 73 jointly significant signals, including 9 additional SNPs which did not reach genome-wide significance in the original GWAS analysis (Additional file 2: Table S2). In particular, 6 COJO signals are located between 89.7~90.1 Mb in chromosome 16 (MC1R locus), indicating multiple genetic variants underlying this genomic region (Additional file 1: Figure S2).

Gene expression analyses reveal 46 functional candidate genes for BCC
In order to identify background genetic factors with the ability to influence or modify the effects of epidermal acquired BCC driver mutations, we chose to interrogate eQTL data from the eQTLGen consortium (n = 31,684 blood samples). Biologically, use of a non-epidermal sample source provides optimal opportunity to detect background genetic (and potentially germline) traits and factors that increase susceptibility to BCC. Statistically, we have previously shown that analysing eQTLGen blood eQTL data is more powerful at identifying functional genes than using tissue-specific eQTL data [28], partly due to the significant boost in power that the large eQTL data sample size provides. However, in order to validate that analysis of blood tissue would not affect the validity of the data, we set out to identify the correlation of eQTL effects (r ) [32]. Ther b between two independent blood cohorts (eQTLGen and GTEx V7 whole blood) is 0.8344 (s.e. = 0.0051) (Additional file 2: Table  S3). Similarly, ther b between blood and two GTEx V7 skin samples (skin non-sun exposed and skin sun exposed) are very high, thereby revealing a positive correlation between blood eQTL data and skin tissue.
SMR analysis using our GWAS summary data and eQTL blood data revealed a total of 46 SMR candidate genes whose expression levels were significantly associated with BCC risk (P SMR < 3.19 × 10 −6 , i.e. 0.05/m SMR , with m SMR = 15,628 being the total number of SMR tests in eQTLGen dataset) ( Table 1). Positive b SMR estimates were obtained for 20 SMR genes (Table 1) and negative b SMR estimates for 26 SMR genes (Table 1, bold text), linking BCC risk with increased gene expression and decreased gene expression, respectively. HEIDI analysis was performed to filter out the SMR associations (with P HEIDI < 0.01) due to LD between the BCC-associated SNPs and the eQTL SNPs, culminating in a refined set of 13 putative causal genes (referred to as SMR-HEIDI genes) ( Table 1, asterisk).

DNA methylation analyses define 5 loci that exhibit both genetic and methylation regulatory mechanisms linked to BCC susceptibility
In order to identify epigenetic regulatory signals associated with BCC susceptibility, we focused on methylation QTL (mQTL) data in blood sample (referred to as mSMR analysis) and identified 54 DNA methylation (DNAm) probes (located in 18 independent loci) that were significantly associated with BCC (P SMR < 5.40 × 10 −7 [m SMR = 92,557] and P HEIDI ≥ 0.01) (Additional file 2: Table S4). By performing an SMR analysis that genetically links DNAm to gene expression (m2eSMR analysis), we identified 41 DNAm sites associated with gene expression. Twenty-seven of the DNAm sites, all located within chromosome 16, were found to associate with seven functionally relevant genes (Additional file 2: Table S5). However, only SPATA2L and RP11-104N10.1 passed the eSMR HEIDI test (P HEIDI ≥ 0.01) ( Table 1). These data, in addition to the COJO analysis findings (Additional file 2: Table S2), indicate that multiple causal variants reside in this region of the genome. A total of 5 loci (BACH2, VDR, STRADB, SPG7, and HLA-DRB1/ DQA2) were identified to exhibit genome-wide significance in both eQTL and DNAm analyses and significant association between DNAm and eQTL (P DNAm->eQTL < 1 × 10 −5 , Additional file 2: Table S5), indicating genetic and methylation regulatory mechanisms driving BCC susceptibility. The combined GWAS, eQTL, and mQTL locus plots of the BACH2 and VDR loci (Fig. 2) and assembly of all the omics level estimates for both genes (Fig. 3, Table 1, Additional file 2: Table S4-S5) are all congruent, revealing the strength of our methodology. In particular, one DNAm site (cg25204543) located in the promoter region of BACH2 (Fig. 2) passed the most stringent thresholds in SMR and HEIDI, indicating a potential regulatory mechanism driving BCC risk. The A allele of variant rs72928038 showed decreasing effect on the expression level of BACH2 via upregulating the methylation level of cg25204543 (located in the promoter region of BACH2), and the increased expression of BACH2 was associated with higher BCC risk.
Protein interaction networks of BCC susceptibility associations reveal a highly connected system Local FI networks were constructed by inputting each of the 71 GWAS hits (using nearest genes) and the 46 SMR candidate genes into the Reactome database. The resulting FI networks represent a global overview of the protein-protein interactions, representing biological functions such as binding, activation, translocation, degradation, classical biochemical events, and catalyst  reactions. The generated GWAS-FI network consists of 42 GWAS nearest gene (proteins) and 29 protein interactors (Additional file 1: Figure S3). Remarkably, it presents as one large interconnected protein network with UBC and P1K3R1 acting as the most highly connected nodes within the network (Additional file 1: Figure S3, red hubs). Pathway analysis of the GWAS-FI network revealed cell cycle and cell death processes, with particularly strong enrichment of immune regulation processes, with 11 of the top 20 pathways (Additional file 2: Table   S6) and 4 of the top 10 GO-Biological processes (Additional file 2: Table S7) linked to immune system function.
The SMR-FI network also formed an interconnected multidimensional network, consisting of 32 SMR genes and 20 protein interactors (Fig. 4). Similarly, pathway analysis of the SMR-FI network revealed cell cycle and cell death processes, with particularly strong enrichment of immune regulation processes, with 9 of the top 20 pathways (Additional file 2: Table S8) and 14 of the top 50 GO-Biological processes linked to immune system activity (Additional file 2: Table S9). These data indicate strong enrichment for immune regulation genes within both the GWAS and SMR-FI networks. We subsequently queried PubMed databases for each of the 46 SMR candidate genes used to create the FI network and confirmed enrichment of three predominant biological processes: cell cycle regulation, cell death, and immune regulation (Additional file 2: Table S10). Interestingly, 11/46 of the SMR and 5/13 SMR-HEIDI genes are associated with regulatory T cell (T Reg ) activity (Additional file 2: Table S10).

Blood and skin gene expression analyses reveal common functional candidate genes
Although blood remains the most accessible source for large-scale transcript profiling, thus ensuring adequate power to detect eQTL, it is equally important to investigate the potential of tissue-specific changes in gene expression. We therefore explored the degree of tissuespecific eQTL overlap between blood and skin samples. We performed SMR analysis using our GWAS summary data and eQTL data from sun exposed skin (sun exposed lower leg, n = 605), non-sun exposed skin (sun not exposed suprapubic, n = 517), and a smaller cohort of whole blood (n = 369) from the GTEx V7 dataset. A total of 25 significant SMR hits ( P SMR < 7.63 × 10 −6 , i.e. 0.05/ 6557) were identified in sun exposed skin and 21 significant SMR hits (P SMR < 9.52 × 10 −6 , i.e. 0.05/5252) in non-sun exposed skin (Table 2 and Additional file 2:  Table S11) culminating in a total of 12 unique skinspecific SMR genes (Table 2). Whole blood revealed 20 (See figure on previous page.) Fig. 2 Integration of GWAS, eQTL, and mQTL data for VDR and BACH2 genes. a -log 10 (P value) of SNPs from BCC GWAS analysis. Gene expression and methylation probes are annotated by red diamonds and blue circles, respectively. Solid diamonds and circles denote probes that passed the HEIDI filtering test (P HEIDI > 0.01). Yellow star highlights the top cis-eQTL SNP (rs7975232). b -log 10 (P value) of SNP association with gene expression (probe ENSG00000111424 tagging VDR). c -log 10 (P value) of SNP association with methylation (DNAm probe cg14854850). d The upper panel shows 25 chromatin state annotations under the genomic region (e.g. promoters and enhancers, annotated by colours on the right bar) from the Roadmap Epigenomics Mapping Consortium. Each row denotes one of the 127 samples with different tissue and cell types (each type annotated by colours on the left bar). The lower panel shows the genes underlying this region and their genomic positions. e -log 10 (P value) of SNPs from BCC GWAS analysis, as described in a. Yellow star highlights the top cis-eQTL SNP (rs7298038). f -log 10 (P value) of SNP association with gene expression (probe ENSG00000112182 tagging BACH2). g -log 10 (P value) of the SNP association with methylation (DNAm probe cg25204543). h The upper panel shows 25 chromatin state annotations as described in d. The lower panel shows the genes underlying this region and their genomic positions  (Table 2 and Additional file 2: Table S11). Although the sample size of blood tissue in GTEx dataset (n = 369) is much smaller than that of eQTLGen dataset (n = 31, 684), the b SMR estimates for the 15 overlapping SMR hits show very high consistency (Pearson's correlation r is 0.92, s.e. = 0.05) (Additional file 2: Table S12), indicating the b SMR estimates are robust for the same tissue from different datasets. Only 7 SMR genes are common among the four datasets analysed ( Table 2). This is likely attributable to sample size (SMR only selects probes with a P eQTL < 5 × 10 −8 ), the different number of probes used for SMR analysis across the datasets (eQTLGen =  Fig. 4 Functional interaction network of the protein-coding genes identified via SMR analysis. Genes listed in black indicate SMR proteins. Genes listed in red indicate protein interactors. In this network, "→" indicates activating/catalysing, "-|" inhibition, "---" predicted FIs, and "-" FIs extracted from complexes or inputs 15,652, GTEx = 4459~6557) and sampling variations. For example, of the 46 genes significant in eQTLGen SMR, only 22 of them had a P eQTL < 5 × 10 −8 in the GTEx dataset. The seven common SMR genes span the three predominant biological processes identified in the eQTLGen-SMR gene list: immune regulation, cell cycle regulation, and cell death (Additional file 2: Table S10). The identification of 24 SMR genes unique to the eQTL-Gen dataset analysis highlights the statistical power gained by performing SMR analyses on tissues with very large sample size and is consistent with our previous studies [28,32].

Discussion
GWAS have provided a powerful approach to the dissection of the genetic components of complex traits. However, the complicated linkage disequilibrium structure of the human genome and the observation that genetic variants can affect phenotype via distant regulation of gene expression recue the power of GWAS alone to identify the specific genes that underlie these complex traits. Here, we used the power of the UK Biobank to perform a GWAS of genetic susceptibility to BCC as the platform, upon which we built an integrated approach of SMR analysis focused on both eQTL and mQTL, Table 2 Overview of blood and skin SMR analyses detailing tissue-specific and common BCC susceptibility genes   Given that control of cell cycle and cell death is well characterised in the biology of BCC formation, we interrogated our SMR functional susceptibility gene list and FI networks to dissect how genetic susceptibility to BCC is influenced by inherited background immune traits.
Reduced expression of HLA is one key mechanism by which tumours escape host immune surveillance [33], and our SMR analyses identified decreased expression of HLA-DQA2 is linked to increased BCC risk. TAP2, another SMR gene, localises to the MHC class II region and plays a pivotal role in immune surveillance, with polymorphisms linked to the susceptibility of various autoimmune disorders [34][35][36] and various neoplasms [37][38][39].
Of particular interest, our integrative approach revealed a strong enrichment of BCC susceptibility genes (24% of SMR and 38% of SMR-HEIDI candidate genes) involved in regulatory T cell (T Reg ) activity. These include previously identified GWAS loci including CTLA4 [10], IRF4 [12], VDR [8], and SMC2 [10]. T Regs are essential for maintaining immune homeostasis by limiting effector T cell activity against foreign antigens. A particularly interesting T Reg -BCC susceptibility gene identified here as a GWAS locus and an SMR-HEIDI eQTL and mQTL gene is BACH2. BACH2 has been linked to B cell lymphoma, CML, and stomach cancer [40][41][42] and has also been shown to be required for efficient formation of T Regs [43]. Consequently, Bach2-deficient mice exhibit markedly impaired tumour growth due to increased effector T cell activation and a reduction in T Regs [44]. Our discovery that increased BACH2 expression correlates with increased risk of BCC (+b SMR ), alongside identification of a BCC-associated methylation site in the promoter of BACH2 which increases gene expression, suggests a molecular mechanism whereby elevated levels of BACH2 promote tumour immunosuppression by attenuating effector T cells. In support of this, BACH2 was recently shown to specifically restrain TCR-driven T Reg activation and actively drive T Reg quiescence [45], thereby indicating BACH2 functions to promote tumour immunosuppression by both upholding a durable T Reg precursor pool and also maintaining T Regs functionally quiescent. Similarly, interrogation of our GWAS and SMR FI networks also revealed strong enrichment of protein hubs linked to T Regs . Conditional deletion of EP300, a highly connected protein interactor in both the GWAS-FI (exhibiting 16 interactions) and SMR-FI network (exhibiting 9 interactions), results in impaired T Reg suppressive function and reduced tumour growth [46]. PTPN11 acts as a protein hub within the SMR-FI network, exhibiting a total of 7 neighbouring nodes, 4 of which are T Reg -associated SMR genes. Myc, another protein hub in the SMR-FI network whose role in cancer has been the focus of intense study over many years, is directly connected to 2 of the 9 T Reg genes represented on the network and can be connected to a total of 8 T Reg genes via one linker protein. Taken together, these data reveal that background genetic factors regulating T Regs immune function act to predispose an individual to BCC.
The mechanism by which BCC susceptibility genes involved in T Reg activity likely function to predispose an individual to BCC is via regulating the tumour microenvironment (TME). The TME is a complex system consisting of tumour cells, endothelial/vascular cells, stroma, and immune cells, and evidence indicates that the interplay between immune cells and other components of the TME largely determines tumour cell survival and disease progression [47]. Recent studies have shown consistency in the TME of immune cells across BCC patients [48], whereas other studies have reported a high proportion of BCCs (82%) present with expression of immune checkpoint proteins on the tumourinfiltrating lymphocytes located in the TME [49]. All these data support how changes in gene expression, as defined by innate genetic predisposition, that produce an immune evasive TME can contribute to the susceptibility of an individual to BCC tumour formation.
Another principle finding of our analyses is the identification of functional candidate genes that were previously reported GWAS hits. Using SMR and HEIDI analysis [16], we demonstrated a direct correlation between GWAS SNP association and changes in gene expression. Importantly, the directions of gene expression change (whereby positive b SMR estimates represent increased gene expression linked increased risk of disease and negative b SMR values represent decreased gene expression linked increased risk of disease) are all consistent with their biological function as reported in the literature. Interrogation of our GWAS-FI and SMR-FI network revealed a host of previously described processes linked to BCC susceptibility including "cellular response to UV", "apoptotic process", and "DNA damage response, signal transduction by p53". Skin-specific processes, however, such as "melanin biosynthetic process", "keratinisation", "positive regulation of hair cycle", and "hair follicle placode formation" were only present in the GWAS-FI network. Given we have shown a high degree of correlationr b between blood and skin eQTL effects, it is unlikely that the GWAS loci contributing to these skin-specific processes failed to progress to SMR genes as a direct consequence of interrogating blood eQTLGen data. We did, however, identify both blood-specific and skin-specific SMR genes when analysing the degree of tissue-specific eQTL overlap using smaller eQTL cohorts. Hence, it remains to be determined whether skinspecific GWAS loci could be identified as functional candidate SMR genes upon access to a large-scale transcript profiling skin dataset, providing adequate power to detect eQTL.

Conclusions
Our data provide important insights into the relationship between disease and host genotype in the most common form of human cancer. Additionally, given the high prevalence of HH pathway activity in BCC, the discovery of genes contributing significantly to polygenic risk illustrates a conceptual framework whereby host genotype is critical for the development of cancer even in the presence of clear somatic oncogenic drivers. Clinically, our data suggest that maintenance of strong cutaneous immunity be incorporated into current BCC prevention strategies/guidelines, thereby strengthening the likelihood of mounting an immune response to tumour antigens in the early stages of cancer formation. Taken together, our association and candidate gene studies have unearthed risk variants that function in a highly interconnected regulatory network and identify potential avenues for intervention.
Additional file 1: Supplementary Figures. Figure S1. Gender and age demographics/distribution of UK Biobank derived BCC cases and controls. Figure S2. Regional plots of MC1R. Figure S3. FI network for the protein-coding 'nearest-genes' identified by GWAS analysis.
Additional file 2: Supplementary Tables. Table S1. Independent common variants associated with BCC in GWAS analysis at P-value < 5E-8. Table S2. Common variants identified by GCTA-COJO analysis of BCC at P < 5E-8 . Table S3. Quantification of the correlation of eQTL effects (r ) between blood and skin samples. Table S4. BCC-associated CpG methylation sites via SMR analysis of the GWAS meta-analysis and mQTL data. Table S5. Mapping the BCC-associated CpG methylation sites to the BCC-associated genes via SMR analysis using the eQTLGen eQTL data. Table S6. BCC GWAS-FI network Pathway Analysis. Table S7. BCC GWAS-FI network GO-Process Analysis. Table S8. BCC SMR-FI network Pathway Analysis. Table S9. BCC SMR-FI network GO-Process Analysis. Table S10. Pubmed search of SMR-HEIDI candidate genes biological processes. Table S11. BCC-associated genes identified via SMR analysis using GTEx and eQTLGen eQTL data. Table S12. Pearson's correlation of GTEx and eQTLGen eQTL data. This excel file contains 11 supplementary tables, related to the main text.