- Open Access
Identification of specific susceptibility loci for the early-onset colorectal cancer
Genome Medicine volume 15, Article number: 13 (2023)
The incidence of early-onset colorectal cancer (EOCRC; patients < 50 years old) has been rising rapidly, whereas the EOCRC genetic susceptibility remains incompletely investigated. Here, we aimed to systematically identify specific susceptible genetic variants for EOCRC.
Two parallel GWASs were conducted in 17,789 CRC cases (including 1490 EOCRC cases) and 19,951 healthy controls. A polygenic risk score (PRS) model was built based on identified EOCRC-specific susceptibility variants by using the UK Biobank cohort. We also interpreted the potential biological mechanisms of the prioritized risk variant.
We identified 49 independent susceptibility loci that were significantly associated with the susceptibility to EOCRC and the diagnosed age of CRC (both P < 5.0×10−4), replicating 3 previous CRC GWAS loci. There are 88 assigned susceptibility genes involved in chromatin assembly and DNA replication pathways, mainly associating with precancerous polyps. Additionally, we assessed the genetic effect of the identified variants by developing a PRS model. Compared to the individuals in the low genetic risk group, the individuals in the high genetic risk group have increased EOCRC risk, and these results were replicated in the UKB cohort with a 1.63-fold risk (95% CI: 1.32–2.02, P = 7.67×10−6). The addition of the identified EOCRC risk loci significantly increased the prediction accuracy of the PRS model, compared to the PRS model derived from the previous GWAS-identified loci. Mechanistically, we also elucidated that rs12794623 may contribute to the early stage of CRC carcinogenesis via allele-specific regulating the expression of POLA2.
These findings will broaden the understanding of the etiology of EOCRC and may facilitate the early screening and individualized prevention.
With approximately 1.9 million new colorectal cancer (CRC) cases and 935,000 deaths, CRC is the third most common cancer and the second leading cause of cancer deaths in 2020 . Over the last several decades, CRC mortality has been steadily declining in many countries, mainly attributed to a healthier lifestyle, early detection, and surveillance . However, the incidence of newly diagnosed early-onset CRC cases (EOCRC; diagnosed CRC <50 years old) has increased by about 2% annually, accounting for 2% to 8% of all CRC cases [3, 4]. As estimated about 15% of all CRC cases will be diagnosed in patients aged younger than 50 years by 2030 . For adults younger than 50 years old, routine CRC screening is not the standard, such as the invasive colonoscopy . Precision cancer screening at an earlier age will benefit the risk discrimination for the high-risk individuals. Therefore, efforts have now focused on investigating the risk factors of EOCRC in order to elucidate more targeted screening approaches and reduced the disease burden.
EOCRC is genetically, pathologically, and molecularly heterogeneous compared with late-onset CRC , since EOCRC tends to show higher pathologic grade and an upward tendency of recurrence and metastasis . EOCRC can be classified into three subgroups: familiar, hereditary, and sporadic EOCRC. Current genetic studies of EOCRC mainly focused on rare monogenic diseases and hereditary EOCRC . First, as the typical familiar EOCRC, familial adenomatous polyposis (FAP) is characterized by numerous colorectal adenomas, and the individuals with APC germline pathogenic mutations usually developed CRC before 40 years old . Moreover, Lynch syndrome is also known as hereditary nonpolyposis colorectal cancer (HNPCC) syndrome. Germline mutations encoding the DNA mismatch repair system (MLH1, MSH2, MSH6, and PMS2) might lead to the increase in the cellular mutational burden and cancer development . For patients with familiar and hereditary EOCRC, routine molecular screening and prophylactic colectomy should be suggested respectively to assess the frequency of germline mutations in patients and their family members [11, 12]. For sporadic EOCRC, both environmental factors and genetic susceptibility contribute to its occurrence , among which the associated germline genetic variants remain unclear .
Genome-wide association studies (GWASs) identify causal variants in genome to facilitate the evaluation of human complex traits and diseases . More than 100 CRC risk loci have been identified by GWASs . A recent study revealed that polygenic risk score (PRS) based on 95 common CRC risk variants was also strongly associated with EOCRC risk . These results indicated that it is crucial to conduct a GWAS to systematically investigate the genetic susceptibility specific to sporadic EOCRC. Hence, our aim was to systematically elucidate the causal genetic variants and understand sporadic EOCRC etiology, which will promote targeted early screening and treatment strategies for the high-risk individuals.
We evaluated associations between germline variants and EOCRC risk in 2 complementary GWASs in a large cohort from European ancestry and identified 49 independent genetic loci significantly associated with EOCRC risk. Furthermore, we developed a PRS model to assess the genetic effect of these identified variants and tested its prediction value in the UK Biobank cohort. Moreover, we investigated the biological plausibility of the top significant EOCRC-specific susceptibility loci. rs12794623 allele-specific facilitated the expression of DNA repair genes POLA2 and enhanced CRC cell proliferation. These findings expanded our insights into the underlying risk of EOCRC and may help to determine surveillance strategies of EOCRC.
In the GWAS stage, all participants were pooled from a large-scale cohort from the Genetics and Epidemiology of CRC Consortium and Colon Cancer Family Registry (GECCO) with three datasets (phs001078.v1.p1, phs001315.v1.p1, and phs001415.v1.p1) . The diagnosis of cases was following the medical records, pathologic reports, cancer registries, or death certificates. Demographic characteristics were collected from a previously published study . Demographic characteristics were demonstrated in Additional file 2: Tables S1 and S2. Reference age is defined as age of diagnosis of first primary CRC for CRC cases, or refers to age at sample selection in healthy controls. A total of 17,863 CRC cases, including 1490 EOCRC cases (241 EOCRC cases under 40 years old), and 20,037 controls of European ancestry were enrolled for the next analysis.
UK Biobank cohort
We also screened participants in the UK Biobank cohort for validation. The data regarding this project were from Application No. 51539. All participants signed an informed consent form, and the UKBB study was ethically approved by the North West Multicenter Research Ethics Committee. Demographic characteristics were demonstrated in Additional file 2: Tables S3 and S4 . CRC cases were defined as subjects with primary invasive CRC diagnosed (1020–1023), or CRC deaths according to ICD9 (1530–1534, 1536–1541) or ICD10 (C180, C182–C189, C19, C20) codes. For each case, we selected 4 eligible controls from subjects without invasive CRC by nearest neighbor matching in R package MatchIt, with enrollment age, enrollment year, ethnicity, and sex as matching criteria. A total of 6,296 CRC cases and 25,184 matched controls were enrolled. After exclusion of the participants without the genotype data, we finally included 723 EOCRC cases (185 EOCRC cases under 40 years old) and 24,427 matched controls. We also collected the demographic, social and behavioral factors including information on ethnicity, drinking frequency, smoking status, and family history of bowel cancer. Ethnicity was defined as White, Mixed, Asian or Asian British, Black or Black British, and other ethnic groups (defined by the original UK Biobank data) via the self-reported questionnaire. CRC family history was derived from the bowel cancer history of the father, mother, and siblings. Smoking status was categorized as “current” or “former” or “never”. For the alcohol intake frequency, we divided participants into heavy alcohol consumption (>3 times/week), moderate consumption as consuming fewer than these amounts, and never.
Genotype data collection, imputation, and quality control
GECCO genotype data were obtained from the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs001078.v1.p1, phs001315.v1.p1, and phs001415.v1.p1 . Imputation was conducted using Michigan Imputation Server , with Haplotype Reference Consortium r1.1.2016 (HRC) as a reference panel . We merged all the batches into a single set after imputation. Several criteria were applied to filter out redundant variants using plink1.9 : (1) SNPs with imputation quality < 0.4; (2) SNPs with minor allele frequency < 1%; (3) SNPs deviating from the Hardy-Weinberg equilibrium (P < 10−6); (4) SNPs with missing call frequencies > 0.02 and SNPs located in the sex chromosome. Additionally, we removed samples without age information. A total of 37,740 individuals with 2,446,560 SNPs were finally remained. To remove the effect of population stratification and structure, we performed PCA using SmartPCA in EIGENSOFT v6.1.4 and selected the top ten PCs for subsequent analyses. No significant population stratifications were observed for cases or controls in these two stages (Additional file 1: Fig. S1).
We adopted two complementary approaches of regression analyses to identify EOCRC-specific susceptibility loci by both EOCRC risks association and CRC onset age association. In the first approach, SNPs were tested for the EOCRC-specific association in 1490 early-onset cases and all healthy control using logistic regression with the adjustment of sex, recruitment center, and 10 principal components. In the second approach, to account for residual confounding of CRC onset age, we tested the association for each variant by using a linear regression model with the independent variable being diagnosed age in all 17,789 CRC cases. Sex, recruitment center, and 10 principal components were adjusted in this approach. Furthermore, to identify the specific susceptibility variants of early-onset CRC (under 40 years old), we also conducted the analysis by using 40 years old as the age cut-off of early-onset CRC. The controls were selected by R package MatchIt with matching sex and recruitment center, and the matching ratio was following the incidence of early-onset CRC risk in the GECCO cohort. A total of 241 EOCRC cases under 40 years old and 3374 matching controls were enrolled. We performed the association for variants by using logistic regression with the adjustment of sex, recruitment center, and 10 principal components.
Polygenic risk score model building
PRS is analyzed by effect sizes estimated from a genome-wide association study, representing a quantitative metric of inherited risk. During PRS calculation, these factors should be considered, including the weights to use for each SNP, the amount of including SNPs, and correlations thresholding between the SNPs (linkage disequilibrium, LD). We calculated the PRS score in clumping and thresholding by preferentially selecting a subset of risk-associated SNPs.
PRSice can provide the most precise threshold and best-fit PRS of the phenotype , through performing clumping to remove ambiguous SNPs and SNPs in LD (r2 ≥ 0.1 within 250 kilobases) with others. Effect sizes for all SNPs were derived from the association result of EOCRC in the GECCO cohort. To develop the weighted PRSCRC, we overlapped 87 SNPs derived from the literature and 40 SNPs previously identified within the GECCO cohort which reached genome-wide significance (P < 5×10−8) . After filtering the variants LD r2 > 0.6, 86 CRC GWAS SNPs remained (Additional file 2: Table S5), and the weighted PRS was developed using previously published log-odds ratios from GWAS publications.
Prediction model and external validation
We built risk-prediction models using logistic regression algorithms in 1490 cases and 19,951 controls, with models including sex and the weighted PRS. Furthermore, we examined the optimal values of the tuning parameters using 10-fold cross-validation from R package caret. We further evaluated the prediction model in the UK Biobank cohort (723 EOCRC cases and 24,427 matched controls) with adjustment of sex, the weighted PRS, and several sociodemographic exposure factors (ethnicity, drinking frequency, smoking status, and family history of bowel cancer) additionally. AUC was calculated to evaluate the discriminatory accuracy of the risk prediction model by R package pROC.
Functional annotation for variants
We next performed a functional annotation for the risk SNPs by using multiple bioinformatic tools and databases, including the HaploReg database (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php), Roadmap Epigenomics Consortium (https://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/), ENCODE database (http://compbio.mit.edu/encode-motifs/), RegulomeDB database (https://www.regulomedb.org/regulome-search), CADD database (https://cadd.gs.washington.edu/score), 3DSNP (https://omic.tech/3dsnpv2/), which integrated multiple histone modification chromatin immunoprecipitation (ChIP)-seq peaks, transcription factor (TF) ChIP-seq peaks. To be specific, (1) regulatory chromatin histone ChIP-Seq were characterized by using the data from the Roadmap Epigenomics Consortium (Epigenome ID as E106 Sigmoid Colon; E075 Colonic Mucosa; E101 and E102 Rectal Mucosa); (2) regulatory motifs were predicted by the ENCODE TF ChIP-seq datasets following the bound sequences; and (3) prioritizing these variants with the scores of regulatory elements by using RegulomeDB, CADD, and 3DSNP prediction databases. Finally, the total functional score was the mean value of respective Z-scores of the RegulomeDB score, CADD score, and 3DSNP score.
HCT116 (RRID:CVCL_0291) and SW480 (RRID:CVCL_0546) cell lines were obtained from the China Center for Type Culture Collection (Wuhan, China). Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM; Gibco, USA) supplemented with 10% fetal bovine serum (FBS; Gibco, USA) and 1% antibiotics (100 U/ml penicillin and 0.1 mg/ml streptomycin) at 37°C in a humidified atmosphere of 5% CO2. All cell lines that we used in this study were tested and authenticated by DNA sequencing using the AmpF/STR method (Applied Biosystems, USA) and tested for the absence of mycoplasma contamination (MycoAlert, USA) within the last 3 years.
Dual-luciferase reporter assay
The plasmids were conducted with a 1-kb DNA sequence around rs12794623 (allele: C/A) and cloned into PGL3-basic (Promega, Madison, USA). HCT116 and SW480 cells were plated in 96-well plates 24 h before transfection (1×104 cells per well). Lipofectamine 3000 (Invitrogen, Waltham, USA) was used to transfect the reporter plasmids and a pRL-SV40 luciferase plasmid (Promega, Madison, USA) into cells. After 48-h incubation with 5% CO2 at 37°C, cells were lysed by 1×passive lysis buffer (Promega, Madison, USA). Luciferase activity was detected by the Dual-Luciferase Reporter Assay System (Promega, Madison, USA). Data were independently conducted in triplicate. Comparisons were conducted by unpaired t-test.
Quantitative real-time (qRT)-PCR
Total RNA of CRC patients’ tissues and CRC cell lines were extracted using TRIzol reagent (Invitrogen, USA). A total of 154 CRC patients were recruited from Tongji Hospital of Huazhong University of Science and Technology (HUST) and Zhongnan Hospital of Wuhan University, Wuhan, China. One hundred fifty-four matched colorectum tumor and normal colon mucosa biopsy specimens were obtained endoscopically. This study was approved by the Biomedical Ethics Committee of Wuhan University, and all participants provided written informed consent prior to data collection. And reverse transcription was performed by the PrimeScript™ RT Master Mix (TaKaRa Bio, Tokyo, Japan). The relative expression levels of POLA2 and GR were detected by qRT-PCR using a SYBR™ Green Master Mix (Applied Biosystems, California, USA) on 7900HT Fast RT PCR System (Applied Biosystems, Foster City, CA). Using the 2−ΔΔCt method, the expression levels of POLA2 and GR were normalized to that of GAPDH, as an endogenous control (Additional file 1: Fig. S2). All primers used are listed in Additional file 2: Table S6.
Electrophoretic mobility shift assays (EMSA)
The double-stranded DNA oligonucleotides centered on rs12794623 alleles were synthesized with biotin-labeled 3′ end (TaKaRa Bio, Tokyo, Japan) (Additional file 2: Table S6). Nuclear extracts of HCT116 and SW480 cells were extracted using a Nuclear and Cytoplasmic Protein Extraction Kit (Beyotime, Shanghai, China). The binding activity of the protein was detected by using an EMSA/Gel-Shift Kit (Beyotime, Shanghai, China). For the competitive binding experiments, specific binding was performed with a 100-fold excess of unlabeled, identical oligonucleotides. After incubated for 20 min, those reaction mixtures were separated on a native 8% PAGE gel and results were detected by SuperSignal West Femto Trial Kit (Thermo, Rockford, USA).
Cell proliferation determination
POLA2 pcDNA3.1 plasmid (RIBOBIO, Guangzhou, China) was transfected in HCT116 and SW480 cells by using Lipofectamine 3000 (Invitrogen, Waltham, USA). The cells were further seeded in 96-well plates after 24 h of culture, and each well contained 2000 cells. Cell viability was measured with CCK-8 assays (Dojindo, Japan) following the manufacturer’s instruction after 24 h, 48 h, 72 h, and 96 h. The absorbance at 450 nm was recorded with six technical replicates and each experiment was repeated in triplicate.
Colony formation assay
After transfected POLA2 pcDNA3.1 plasmid, HCT116 or SW480 cells were cultured in 6-well plates (1000 cells per well. We changed the culture medium once a week with fresh DMEM with 10% FBS medium. After incubation for 2–3 weeks, colonies were washed twice with PBS solution and fixed with 100% methanol. Then crystal violet solution (Solarbio, Beijing, China) was used to stain colonies at room temperature for 30 min. Deionized water was used to clean the colonies, then the colonies were photographed. Each experiment was repeated three times.
Differences in demographic characteristics between cases and controls were assessed by Student’s t-test or Pearson χ2 test. Quantile-quantile (Q-Q) plots were assessed to determine whether the distribution of the P value was consistent with null distribution (except for the extreme tail). The genomic-inflation factor (λ) in distinct approaches were presented in Additional file 1: Fig. S3. Inclusion criteria of candidate SNPs were (1) attained genome-wide significance (P < 5.0×10−4) in both EOCRC risks association and CRC onset age association and (2) with odds ratio (OR) > 1 in EOCRC risks associations or beta < 0 in CRC onset age-related association. ORs and 95% confidence intervals (CI) were also estimated comparing quartiles of PRS. The data of tumor mutational burden (TMB) and gene expression data were obtained from COAD and READ tissues in the TCGA database (https://portal.gdc.cancer.gov). TMB per megabase is calculated by dividing the total number of mutations by the size of the coding region of the target. Samples were divided into two groups based on median TMB. All P < 0.05 calculated by unpaired two-sided Student’s t-test were considered significant. All statistical analyses were performed by R (4.0.3) or PLINK (1.9) software. GraphPad Prism v6.0 Software was adopted to create graphs.
The study overview and characteristics of the participants
We conducted GWAS in a large-scale population to identify genetic variants significantly associated with EOCRC risk. The workflow was graphed in Fig. 1. Following imputation and quality control, we obtained 2,446,560 SNPs in 17,789 CRC cases and 19.951 controls. There were 1490 EOCRC cases (under 50 years old) accounting for approximately 8% of all CRC cases and about 4% of all subjects, and 241 EOCRC cases (under 40 years old) occupying around 1% of all CRC cases and about 0.6% of all subjects. Mean age at diagnosis of EOCRC cases was 44.1 years, while for late-onset CRC cases, it was 65.6 years. Men and women were nearly equivalently represented across cases and controls (Additional file 2: Tables S1 and S2).
Identification of EOCRC-specific susceptibility loci
We performed two complementary regression analyses to identify EOCRC-specific susceptibility loci as described in methods. In the EOCRC risks association analysis, we identified 2118 loci were significantly associated with EOCRC susceptibility with P < 5.0×10−4 (Fig. 2A). And 5071 variants were significantly associated with CRC onset age in the linear association analysis (Fig. 2B). After combining these two analysis results, we identified 211 variants (49 independent signals with r2 < 0.6) were significantly associated with EOCRC risk (Table 1, Additional file 1: Fig. S4 and Additional file 2: Table S7).
We also replicated three reported CRC GWAS loci (Additional file 2: Table S5), namely, 1q41 (rs12137323, P = 3.68×10−6, OR = 1.25, 95% CI =1.15–1.34), 8q24.21 (rs10505506, P = 2.01×10−4, OR = 1.16, 95% CI =1.08–1.23) and 14q23.1 (rs2358662, P = 3.09×10−4, OR = 1.16, 95% CI =1.08–1.25). As the strongest signal in the 1q41 region, rs12137323 (sorted by P-value in the first analysis) is in strong LD (r2 = 0.81) with previous GWAS identified tagSNP rs6687758 in the East Asian [22, 23]. The tagSNP rs6687758 locates 125kb upstream of DUSP10, a dual-specificity phosphatase inactivating p38 and SAPK/JNK pathways . Additionally, inhibition of DUSP10 was reported to be associated with gut inflammation, which might lead to the early development of intestinal cancer .
For EOCRC under 40 years old, we further identified the 1,519 susceptibility variants with P < 5.0×10−4 (Additional file 1: Fig. S5). After combining the age association analysis results, we found 84 variants (16 independent signals with r2 < 0.6) were significantly associated with EOCRC risk under 40 years old (Additional file 2: Tables S8 and S9). There were several risk variants associated with EOCRC risks in both two age cut-off groups (Additional file 2: Table S10).
The identified EOCRC risk variants point to susceptibility genes
We then mapped those 211 risk variants to their related genes by VarioWatch , generating 88 genes potentially relative with EOCRC risk, including 57 protein-coding genes and 31 non-coding genes (Fig. 2C and Additional file 2: Table S11). For instance, the hypermethylation of transcription elongation regulator 1-like (TCERG1L) is observed in precancerous colon polyps  and has been regarded as a risk marker of CRC in patients with ulcerative colitis . Then we performed pathway and process enrichment analysis, noticing pathways involved in chromatin assembly and DNA replication (chromatin silencing and nucleosome assembly), including cell cycle-dependent histone H4 family genes (Fig. 2D). Furthermore, the gene-disease association analysis was conducted to explore the associated traits by DisGeNET databases . We observed that those genes significantly contributed to precancerous polyps which were known as the precursors of CRC (Fig. 2E) . Collectively, it suggested that EOCRC-specific risk loci may alter gene expression and dysregulate signaling pathways involved in EOCRC progression.
EOCRC-specific susceptibility loci can promote predictive power of EOCRC risk
To quantify the prediction value of identified EOCRC-specific susceptibility loci, we next generated PRSEOCRC by calculating the effect sizes of 49 identified tagSNPs with unconditional logistic regression. We found that early-onset CRC cases showed marked skewing higher PRS quartiles compared with controls in the GECCO cohort (Additional file 2: Table S12). The highest PRSEOCRC quartile had a 3.8-fold risk than the lowest (OR = 3.79; 95% CI: 3.21–4.47) (Fig. 3A). Interestingly, the associations were successfully replicated in the UKB cohort, presenting a 1.6-fold risk for individuals with highest PRSEOCRC quartile (OR: 1.63; 95% CI: 1.32–2.02) (Fig. 3B and Additional file 2: Table S13). We then tested whether these 49 EOCRC-specific risk variants could improve EOCRC risk prediction performance by comparing PRSCRC derived from 86 previously identified CRC risk SNPs and PRSEOCRC+CRC derived from combining those SNPs. The latter significantly increased the ability to predict EOCRC risk (Fig. 3C, D and Additional file 2: Tables S12 and S13). To further quantify the prediction value of identified the EOCRC (under 40 years old) susceptibility loci, we examined PRS scores derived from 16 identified specific EOCRC (under 40 years old) variants in 241 EOCRC cases (under 40 years old) and 3374 healthy controls in the GECCO cohort, and found similar results (Additional file 1: Fig. S6). Collectively, the predictive power of PRS increased after including the genetic effects of EOCRC-specific susceptibility loci.
We next evaluated the predictive accuracy of the model by estimating the area under the ROC curve (AUC). Compared with PRSCRC, the addition of identified EOCRC risk loci significantly increased discriminatory accuracy from 0.585 to 0.652 for the GECCO cohort, and AUC increased from 0.589 to 0.604 for the UK Biobank cohort (Fig. 4A and B). Discrimination of the PRSEOCRC+CRC for EOCRC is reflected by less overlapping distributions between incident cases and controls compared with PRSCRC (Fig. 4C and D). Considering the potential influences of other risk factors, we further adjusted several important sociodemographic factors in the prediction model in the UK Biobank cohort. The discrimination of the model-adjusted sociodemographic factors was 0.566 (95% CI: 0.544–0.587). Comparing with the model-adjusted sex and PRSEOCRC+CRC, the addition of the sociodemographic factors significantly increased the discrimination from 0.604 to 0.626 (Additional file 1: Fig. S7 and Additional file 2: Table S14). PRSs derived and validated here highlighted the potential for genomic screening and personalized risk assessment for EOCRC.
The top EOCRC-specific risk variant rs12794623 acts as an active promoter of POLA2 in an allele-specific manner
To further acquire more insights into the genetic basis of EOCRC, we further functionally annotated those genetic variants by using multiple bioinformatic tools and databases, including the HaploReg, Roadmap Epigenomics Consortium, ENCODE, RegulomeDB, CADD, and 3DSNP databases, which integrated multiple histone modification ChIP-seq peaks and TF ChIP-seq peaks (Additional file 2: Table S15). Interestingly, the variant rs12794623 was revealed to be the most potential to be functional, and which was also the strongest EOCRC-specific signal identified (P = 4.41×10−6, OR = 1.33, Fig. 5A), apart from the previous GWAS locus in 1q41 region. The functional variant, rs12794623, located in the 5′UTR of POLA2, is a significant eQTL for this gene. Individuals carrying the rs12794623-A allele had higher POLA2 expression than rs12794623-C allele carriers in colon sigmoid tissues from GTEx data (P = 1.22×10−13, Fig. 5B). We then conducted several experiments to evaluate its function. By transfecting plasmids containing different alleles of POLA2 promoter in HCT116 and SW480 cells (Fig. 5C), we found the construct containing the rs12794623-A allele exhibits significantly higher luciferase activity than that containing rs12794623-C allele. Moreover, we performed electrophoretic mobility shift assays (EMSA) and found that the rs12794623 might alter transcription factors binding in an allele-specific manner (Fig. 5D and E). Based on the TF motif prediction in HaploReg , we supposed that the rs12794623-C allele might regulate POLA2 expression via affecting GR binding (Fig. 5F), and POLA2 expression was observed to be moderately correlated with GR expression in both TCGA CRC tissues (P = 2.50×10−8, r = −0.27) and our own CRC tissues (P = 1.89×10−10, r = −0.50) (Fig. 5G and H). Collectively, these results displayed that rs12794623 might allele-specifically influenced the expression of POLA2 by the transcriptional regulation of GR.
Overexpression of POLA2 enhances CRC cell proliferation
Previous studies showed that POLA2 played an important role in DNA replication . To investigate the role of POLA2 in EOCRC tumorigenesis, we first evaluated POLA2 expression in tumor and adjacent normal tissues from multiple databases, including TCGA/GTEx samples, GEO datasets, and our own CRC patients. Results showed that POLA2 significantly overexpressed in CRC tissues than in peritumoral tissues from our CRC samples (Fig. 6A), consistent with other databases’ results (Fig. 6A and B). Furthermore, we investigated the clinical significance of POLA2 and found that higher TMB was associated with the higher expression of POLA2 in TCGA CRC tissues (Additional file 1: Fig. S8). Furthermore, the CRISPR-Cas9-mediated loss-of-function screen data showed POLA2 is likely to be one of the key genes affecting CRC cell survival functions (Fig. 6C) . Data from the Oncomine database also suggested that POLA2 amplification frequently occurred across cancer types (Fig. 6D). CCK-8 and colony formation assay indicated that overexpression of POLA2 substantially increased the cell proliferation rate of CRC cells (Fig. 6E and F). The protein interaction network and co-expression analysis showed the interacted genes of POLA2 played essential roles in DNA replication (Fig. 6G) . Additional, POLA2 co-expressed with DNA replication genes, CDC45, MCM2, MCM4, and PRIM2, in CRC tissues (Fig. 6H). Thus, these findings implied that POLA2 might influence CRC carcinogenesis by affecting DNA replication.
In this study, we performed a large-scale GWAS for sporadic EOCRC, finding 49 EOCRC-specific risk loci. Candidate gene mapping and pathway analysis highlighted 88 potential functional genes and involved pathways in EOCRC. Based on these variants, PRS significantly improved risk prediction performance in both the GECCO and UKB cohorts. Besides, we demonstrated that the top EOCRC-specific variant rs12794623 regulated POLA2 expression to mediate EOCRC tumorigenesis. Our work provided novel clues for downstream investigation into EOCRC etiology and potentially be applied in prevention and screening strategies.
Given that EOCRC incidence has been increasing worldwide , the genetic basis of EOCRC remains under-investigated [36, 37]. We performed two complementary GWASs focused on variants that not only contributed to EOCRC risk but also the onset age of CRC. As the result, 211 variants in 49 independent loci were identified as EOCRC-specific risk loci and most of them were first identified in CRC. We also found 16 variants were in LD with previously identified tagSNPs, indicating EOCRC and late-onset CRC shared partial genetic susceptibility . We also validated 142 variants from the above 211 variants associated with EOCRC risk in both those under 50 years old and under 40 years old in the GECCO cohort. Furthermore, we found 84 variants (16 independent signals with r2 < 0.6) were significantly associated with younger EOCRC risk in 241 EOCRC cases (under 40 years) and 3,374 healthy controls. However, the sample size of EOCRC cases (under 40 years) was limited, future studies might especially focus on the EOCRC cases under 40 years old. Besides, we further mapped potential susceptibility genes for those 211 causal variants and found them enriched in chromatin assembly and DNA replication pathways (H3C11, H2AC13, and H1-5 gene). It suggested that DNA replication disorders might contribute to the initial stage of EOCRC, which was confirmed by previous studies in early-onset cancers [38, 39]. Also, as the most associated disease of EOCRC susceptibility genes, precancerous polyps might be the precursor lesion of EOCRC [40, 41]. Additionally, patients with colorectal polyps were particularly at higher risk of EOCRC .
Construction of a PRS to evaluate the overall predictive power of common risk loci  for CRC has been carried out previously . Although a recent study explored the prediction value of GWAS-identified genetic variants associated with EOCRC , limited information existed regarding the performance by adding specific susceptibility loci for sporadic EOCRC. By generating PRS and constructing risk prediction models of EOCRC, we found that the addition of EOCRC-specific risk variants combined with previously GWAS-identified genetic variants significantly improved the prediction accuracy of EOCRC in two large cohorts. These findings highlighted the potential utility of PRS in identifying high EOCRC-risk individuals. Meanwhile, the specific performance across both two cohorts might owe to the population heterogeneity, since racial disparities in the incidence of EOCRC and survival for colon cancer have been proven [45, 46], indicating that ethnically targeted early detection strategy should be adopted to screen high-risk people of EOCRC.
We further investigated the function of an identified variant, rs12794623, located in the 5’UTR region of POLA2. We found it regulated POLA2 expression in an allele-specific manner via affecting the binding affinity of GR, which was recognized as an important tumor suppressor . GR expression contributes to the recovery from intestinal inflammation by induction of tissue repair mechanisms after intestinal tissue damage . As a DNA polymerase subunit, POLA2 plays a crucial role at the G1 phase  by influencing DNA double-strand break repair, interacting with other DNA replication effectors . Co-expression analysis in CRC tissues also showed that POLA2 may interact with several DNA damage repair genes, including MCM2, MCM10, CDC45, and PRIM2 . These findings indicated an important role of DNA repair in EOCRC development, similarly with a previous retrospective review . Furthermore, molecular pathologic analyses have been illustrated to be an important post-GWAS approach that can optimize individual prevention and therapy by focusing on the pathogenic process . In the future, integrating the susceptibility gene expression with epigenetic alterations in EOCRC, such as MSI status [35, 36], CpG island methylator phenotype (CIMP) , and chromosomal instability , will elucidate the functional mechanisms of causal susceptibility and the plausible etiologic factors in the carcinogenic process.
As the most used strategy for CRC screening, the long-term effects of the fecal immunochemical test (FIT) and colonoscopy have been demonstrated . Over the decades, widespread CRC screening among people over the age 55 years has contributed substantially to the decreasing CRC incidence and mortality [56, 57]. Due to the rising incidence of early-onset CRC, the US Preventive Services Task Force (USPSTF) , the US Multi-Society Task Force , and the American Cancer Society (ACS)  successively recommend starting CRC screening at age 45–50 years. Those recommendation has aroused intense discussion focusing on the potential benefits, liabilities, and economic outcomes . It can be predicted that the cost-effectiveness of early-onset CRC screening programs and the efficacy of the healthcare systems will face major challenges [62, 63]. Here, we identified several genetic causal variants and constructed the prediction models of early-onset CRC, which might help to tentatively develop personalized screening strategies. The aim of this study was to precisely filter the people with high risks of early-onset CRC and promote the most cost-effective strategy for targeted screening in high-risk individuals. In the future, large-scale longitudinal studies can demonstrate long-term exposures from early life to adulthood and contribute to the advancements in precision prevention, combining with prospective biospecimen collections, multi-omics integration, and molecular pathological epidemiology, immunity, and tumor microenvironment analyses .
This study had several strengths. It is the first GWAS for sporadic EOCRC with a multi-stage design in a large population. The construction and evaluation of the PRS risk model in the UKB cohort can help with early screening and individualized treatment of EOCRC. We also revealed the potential regulation of a risk variant in EOCRC progression. However, there were some limitations. A major limitation of this study was the lack of stratification in the family history (Lynch syndromes and other rare hereditary CRC syndromes) and molecular pathologic classification (MSI status and CIMP) . Second, although we analyzed the potential effects of several important sociodemographic factors in the risks of EOCRC, other potential environmental risk factors of EOCRC need to be considered in the GWAS stages. Third, the functional roles of other identified EOCRC risk variants are warranted to be investigated by high-throughput experimental methods in the future. At last, rare alleles were excluded, which might have a high impact on the risk of EOCRC . Specific analysis for the genetic burden of rare mutations needs to be performed in the future.
We systematically investigated the specific susceptibility of EOCRC and assessed the prediction value of identified loci in diverse populations. Moreover, leveraging on functional assays, we elucidated the biological plausibility of a top EOCRC-specific signal. These findings highlighted the underlying mechanism of EOCRC tumorigenesis, which might help to empower early prevention and detection strategies.
Availability of data and materials
Publicly available datasets were used in this study. GECCO genotype data were obtained from the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs001078.v1.p1, phs001315.v1.p1, and phs001415.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001078.v1.p1; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001315.v1.p1; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001415.v1.p1) . Our accession to the UK Biobank data was through application No.51539 (https://biobank.ndph.ox.ac.uk/showcase/). All data generated or analyzed during this study are included in this published article and its supplementary information files. Example code to run principal component analysis (PCA), association analysis, polygenic risk score, and prediction model validation is given in Supplementary Information: Text S1. Other web resources used in this article are listed: Michigan Imputation Server, https://imputationserver.sph.umich.edu/index.html#!; TCGA database, https://portal.gdc.cancer.gov; LocusZoom, http://locuszoom.sph.umich.edu/; RegulomeDB Score: https://www.regulomedb.org/regulome-search; CADD Score: https://cadd.gs.washington.edu/score; 3DSNP Score: https://omic.tech/3dsnpv2/; HaploReg v4.1, https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php; JASPAR database, https://jaspar.genereg.net/; GTEx V8 database, (dbGaP Accession phs000424.v8.p2), https://www.gtexportal.org/home/.
Early-onset colorectal cancer
Genome-wide association study
Polygenic risk score
Principal component analysis
Area under the ROC curve
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Keum N, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nat Rev Gastroenterol Hepatol. 2019;16(12):713–32.
Siegel RL, Torre LA, Soerjomataram I, Hayes RB, Bray F, Weber TK, et al. Global patterns and trends in colorectal cancer incidence in young adults. Gut. 2019;68(12):2179–85.
Vuik FE, Nieuwenburg SA, Bardou M, Lansdorp-Vogelaar I, Dinis-Ribeiro M, Bento MJ, et al. Increasing incidence of colorectal cancer in young adults in Europe over the last 25 years. Gut. 2019;68(10):1820–6.
Bailey CE, Hu CY, You YN, Bednarski BK, Rodriguez-Bigas MA, Skibber JM, et al. Increasing disparities in the age-related incidences of colon and rectal cancers in the United States, 1975-2010. JAMA Surg. 2015;150(1):17–22.
Corley DA, Peek RM Jr. When should guidelines change? A clarion call for evidence regarding the benefits and risks of screening for colorectal cancer at earlier ages. Gastroenterology. 2018;155(4):947–9.
Akimoto N, Ugai T, Zhong R, Hamada T, Fujiyoshi K, Giannakis M, et al. Rising incidence of early-onset colorectal cancer - a call to action. Nat Rev Clin Oncol. 2021;18(4):230–43.
Yeo H, Betel D, Abelson JS, Zheng XE, Yantiss R, Shah MA. Early-onset colorectal cancer is distinct from traditional colorectal cancer. Clin Colorectal Cancer. 2017;16(4):293–9.e6.
Archambault AN, Su YR, Jeon J, Thomas M, Lin Y, Conti DV, et al. Cumulative burden of colorectal cancer-associated genetic variants is more strongly associated with early-onset vs late-onset cancer. Gastroenterology. 2020;158(5):1274–86.e12.
Hampel H, Frankel WL, Martin E, Arnold M, Khanduja K, Kuebler P, et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal cancer). N Engl J Med. 2005;352(18):1851–60.
Boland PM, Yurgelun MB, Boland CR. Recent progress in Lynch syndrome and other familial colorectal cancer syndromes. CA Cancer J Clin. 2018;68(3):217–31.
Koskenvuo L, Ryynanen H, Lepisto A. Timing of prophylactic colectomy in familial adenomatous polyposis. Colorectal Dis. 2020;22(11):1553–9.
O’Sullivan DE, Sutherland RL, Town S, Chow K, Fan J, Forbes N, et al. Risk factors for early-onset colorectal cancer: a systematic review and meta-analysis. Clin Gastroenterol Hepatol. 2022;20(6):1229–40.e5.
Burnett-Hartman AN, Lee JK, Demb J, Gupta S. An update on the epidemiology, molecular characterization, diagnosis, and screening strategies for early-onset colorectal cancer. Gastroenterology. 2021;160(4):1041–9.
Tam V, Patel N, Turcotte M, Bosse Y, Pare G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–84.
Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat Genet. 2019;51(1):76–87 phs001078.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001078.v1.p1); phs001315.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001315.v1.p1); and phs001415.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001415.v1.p1).
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. Genome-wide genetic data on ~500,000 UK Biobank participants; 2017. p. 166298.
Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–7.
McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279–83.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Euesden J, Lewis CM, O'Reilly PF. PRSice: Polygenic Risk Score software. Bioinformatics. 2015;31(9):1466–8.
Zhang B, Jia WH, Matsuda K, Kweon SS, Matsuo K, Xiang YB, et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet. 2014;46(6):533–42.
Houlston RS, Cheadle J, Dobbins SE, Tenesa A, Jones AM, Howarth K, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet. 2010;42(11):973–7.
Tanoue T, Moriguchi T, Nishida E. Molecular cloning and characterization of a novel dual specificity phosphatase, MKP-5. J Biol Chem. 1999;274(28):19949–56.
Png CW, Weerasooriya M, Guo J, James SJ, Poh HM, Osato M, et al. DUSP10 regulates intestinal epithelial cell growth and colorectal tumorigenesis. Oncogene. 2016;35(2):206–17.
Cheng YC, Hsiao FC, Yeh EC, Lin WJ, Tang CY, Tseng HC, et al. VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era. Nucleic Acids Res. 2012;40(Web Server issue):W76–81.
Yi JM, Dhir M, Guzzetta AA, Iacobuzio-Donahue CA, Heo K, Yang KM, et al. DNA methylation biomarker candidates for early detection of colon cancer. Tumour Biol. 2012;33(2):363–72.
Kim TO, Park J, Kang MJ, Lee SH, Jee SR, Ryu DY, et al. DNA hypermethylation of a selective gene panel as a risk marker for colon cancer in patients with ulcerative colitis. Int J Mol Med. 2013;31(5):1255–61.
Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45(D1):D833–D9.
Song M, Emilsson L, Roelstraete B, Ludvigsson JF. Risk of colorectal cancer in first degree relatives of patients with colorectal polyps: nationwide case-control study in Sweden. BMJ. 2021;373:n877.
Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40(Database issue):D930–4.
Dang TT, Morales JC. Involvement of POLA2 in double strand break repair and genotoxic stress. Int J Mol Sci. 2020;21(12):4245.
Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet. 2017;49(12):1779–84.
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47(D1):D607–D13.
Akimoto N, Ugai T, Zhong R, Hamada T, Fujiyoshi K, Giannakis M, et al. Rising incidence of early-onset colorectal cancer - a call to action. Nat Rev Clin Oncol. 2021;18(4):230–43.
Pearlman R, Frankel WL, Swanson B, Zhao W, Yilmaz A, Miller K, et al. Prevalence and spectrum of germline cancer susceptibility gene mutations among patients with early-onset colorectal cancer. JAMA Oncol. 2017;3(4):464–71.
Hofseth LJ, Hebert JR, Chanda A, Chen H, Love BL, Pena MM, et al. Early-onset colorectal cancer: initial clues and current views. Nat Rev Gastroenterol Hepatol. 2020;17(6):352–64.
Palles C, Cazier JB, Howarth KM, Domingo E, Jones AM, Broderick P, et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet. 2013;45(2):136–44.
Lessel D, Vaz B, Halder S, Lockhart PJ, Marinovic-Terzic I, Lopez-Mosqueda J, et al. Mutations in SPRTN cause early onset hepatocellular carcinoma, genomic instability and progeroid features. Nat Genet. 2014;46(11):1239–44.
Wong S, Lidums I, Rosty C, Ruszkiewicz A, Parry S, Win AK, et al. Findings in young adults at colonoscopy from a hospital service database audit. BMC Gastroenterol. 2017;17(1):56.
Joh HK, Lee DH, Hur J, Nimptsch K, Chang Y, Joung H, et al. Simple sugar and sugar-sweetened beverage intake during adolescence and risk of colorectal cancer precursors: adolescent sugar intake and colorectal polyp. Gastroenterology. 2021;161(1):128–142.e20.
Winawer SJ, Zauber AG, Ho MN, O’Brien MJ, Gottlieb LS, Sternberg SS, et al. Prevention of colorectal cancer by colonoscopic polypectomy. The National Polyp Study Workgroup. N Engl J Med. 1993;329(27):1977–81.
Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19(9):581–90.
Dunlop MG, Tenesa A, Farrington SM, Ballereau S, Brewster DH, Koessler T, et al. Cumulative impact of common genetic variants and other risk factors on colorectal cancer risk in 42,103 individuals. Gut. 2013;62(6):871–81.
Murphy CC, Wallace K, Sandler RS, Baron JA. Racial disparities in incidence of young-onset colorectal cancer and patient survival. Gastroenterology. 2019;156(4):958–65.
Araghi M, Fidler MM, Arnold M, Jemal A, Bray F, Soerjomataram I. The future burden of colorectal cancer among US blacks and whites. J Natl Cancer Inst. 2018;110(7):791–3.
Matthews LC, Berry AA, Morgan DJ, Poolman TM, Bauer K, Kramer F, et al. Glucocorticoid receptor regulates accurate chromosome segregation and is associated with malignancy. Proc Natl Acad Sci U S A. 2015;112(17):5479–84.
Meers GK, Bohnenberger H, Reichardt HM, Luhder F, Reichardt SD. Impaired resolution of DSS-induced colitis in mice lacking the glucocorticoid receptor in myeloid cells. PLoS One. 2018;13(1):e0190846.
Fragkos M, Ganier O, Coulombe P, Mechali M. DNA replication origin activation in space and time. Nat Rev Mol Cell Biol. 2015;16(6):360–74.
Brown JS, O’Carrigan B, Jackson SP, Yap TA. Targeting DNA repair in cancer: beyond PARP inhibitors. Cancer Discov. 2017;7(1):20–37.
Hartman TR, Demidova EV, Lesh RW, Hoang L, Richardson M, Forman A, et al. Prevalence of pathogenic variants in DNA damage response and repair genes in patients undergoing cancer risk assessment and reporting a personal history of early-onset renal cancer. Sci Rep. 2020;10(1):13518.
Ogino S, Chan AT, Fuchs CS, Giovannucci E. Molecular pathological epidemiology of colorectal neoplasia: an emerging transdisciplinary and interdisciplinary field. Gut. 2011;60(3):397–411.
Baba Y, Huttenhower C, Nosho K, Tanaka N, Shima K, Hazra A, et al. Epigenomic diversity of colorectal cancer indicated by LINE-1 methylation in a database of 869 tumors. Mol Cancer. 2010;9:125.
Arriba M, Garcia JL, Inglada-Perez L, Rueda D, Osorio I, Rodriguez Y, et al. DNA copy number profiling reveals different patterns of chromosomal instability within colorectal cancer according to the age of onset. Mol Carcinog. 2016;55(5):705–16.
Shaukat A, Levin TR. Current and future colorectal cancer screening strategies. Nat Rev Gastroenterol Hepatol. 2022;19(8):521–31.
Zauber AG, Winawer SJ, O’Brien MJ, Lansdorp-Vogelaar I, van Ballegooijen M, Hankey BF, et al. Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N Engl J Med. 2012;366(8):687–96.
Levin TR, Corley DA, Jensen CD, Schottinger JE, Quinn VP, Zauber AG, et al. Effects of organized colorectal cancer screening on cancer incidence and mortality in a large community-based population. Gastroenterology. 2018;155(5):1383–91 e5.
Rex DK, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, Kaltenbach T, et al. Colorectal cancer screening: recommendations for physicians and patients from the U.S. Multi-Society Task Force on Colorectal Cancer. Gastroenterology. 2017;153(1):307–23.
Force USPST, Bibbins-Domingo K, Grossman DC, Curry SJ, Davidson KW, Epling JW Jr, et al. Screening for colorectal cancer: US Preventive Services Task Force recommendation statement. JAMA. 2016;315(23):2564–75.
Wolf AMD, Fontham ETH, Church TR, Flowers CR, Guerra CE, LaMonte SJ, et al. Colorectal cancer screening for average-risk adults: 2018 guideline update from the American Cancer Society. CA Cancer J Clin. 2018;68(4):250–81.
Ladabaum U, Mannalithara A, Meester RGS, Gupta S, Schoen RE. Cost-effectiveness and national effects of initiating colorectal cancer screening for average-risk persons at age 45 years instead of 50 years. Gastroenterology. 2019;157(1):137–48.
Liang PS, Shaukat A. Assessing the impact of lowering the colorectal cancer screening age to 45 years. Lancet Gastroenterol Hepatol. 2020;5(6):523–4.
Imperiale TF, Kahi CJ, Rex DK. Lowering the starting age for colorectal cancer screening to 45 years: who will come...and should they? Clin Gastroenterol Hepatol. 2018;16(10):1541–4.
Ogino S, Nowak JA, Hamada T, Milner DA Jr, Nishihara R. Insights into pathogenic interactions among environment, host, and tumor at the crossroads of molecular pathology and epidemiology. Annu Rev Pathol. 2019;14:83–103.
Jasperson KW, Tuohy TM, Neklason DW, Burt RW. Hereditary and familial colon cancer. Gastroenterology. 2010;138(6):2044–58.
Chubb D, Broderick P, Dobbins SE, Frampton M, Kinnersley B, Penegar S, et al. Rare disruptive mutations and their contribution to the heritable risk of colorectal cancer. Nat Commun. 2016;7:11883.
We are grateful to all members in Miao lab who participated in the study, as well as all individuals who helped us successfully complete the research.
This work was supported by the Youth Program of National Natural Science Foundation of China (NSFC-82273713, NSFC-82103929), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001), Fundamental Research Funds for the Central Universities (WHU:2042022kf1205), and Knowledge Innovation Program of Wuhan (whkxjsj011) for Jianbo Tian; National Science Fund for Distinguished Young Scholars of China (NSFC-81925032), Key Program of National Natural Science Foundation of China (NSFC-82130098), and Natural Science Foundation of Hubei Province (2019CFA009) for Xiaoping Miao; National Program for Support of Top-notch Young Professionals for Jiang Chang; Youth Program of National Natural Science Foundation of China (NSFC-81402098) for Heng Li; Youth Program of National Natural Science Foundation of China (NSFC-82003547) for Ying Zhu.
Ethics approval and consent to participate
This study was conducted using UK Biobank resources (Application #51539) and the Genetics and Epidemiology of CRC Consortium and Colon Cancer Family Registry resources (phs001078.v1.p1, phs001315.v1.p1, and phs001415.v1.p1). This project was approved by the Biomedical Ethics Committee of Wuhan University (WHUIRB-2022-04), and all participants provided written informed consent prior to data collection.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1:
Additional figures from the results of early-onset colorectal cancer genetic susceptibility analysis. Fig. S1. 3D Plots for genetic matching of three principal components. Fig. S2. Determination of the POLA2 transfection efficiency. Fig. S3. Quantile-quantile plot and genomic inflation factor lambda for associations with early-onset CRC risk. Fig. S4. Regional plots of association results and recombination rates within the four significant susceptibility loci. Fig. S5. Manhattan plots for associations between genetic variants and EOCRC risk under 40 years old. Fig. S6. Risk estimates for EOCRC (under 40 years old) associated with the PRS deriving from distinct SNP. Fig. S7. Risk estimates for EOCRC associated with the PRSs under the adjustment of sociodemographic factors in the UKB cohort. Fig. S8. Higher tumor mutational burden was associated with the higher expression of POLA2 in TCGA CRC tissues.
Additional file 2:
Additional tables from the results of early-onset colorectal cancer genetic susceptibility analysis. Table S1. Demographic characteristics of cases with early-onset CRC and controls in the GECCO cohort of this study. Table S2. Demographic characteristics of cases with early-onset CRC (under 40 years old) and matching controls in the GECCO cohort of this study. Table S3. Demographic characteristics of cases with early-onset CRC and controls in the UK Biobank cohort of this study. Table S4. Demographic characteristics of cases with early-onset CRC (under 40 years old) and controls in the UK Biobank cohort of this study. Table S5. The previous CRC GWAS loci were calculated in the PRS model. Table S6. Probes or primers sequence used in the study. Table S7. 211 Newly identified variants associated with early-onset CRC risk in the GECCO cohort. Table S8. 84 newly identified variants associated with EOCRC risk (under 40 years old) in the GECCO cohort. Table S9. Results for 16 identified variants associated with EOCRC risk (under 40 years old) in the GECCO cohort (LD r2 < 0.6). Table S10. The variants associated with early-onset CRC risk in both two age cut-off groups in the GECCO cohort. Table S11. 88 mapped genes of 211 EOCRC specific risk loci. Table S12. Risk estimates for early-onset CRC associated with different PRS scores in the GECCO cohort. Table S13. Risk estimates for early-onset CRC associated with different PRS scores in the UKB cohort. Table S14. Risk estimates for early-onset CRC associated with different PRS scores with adjustment of sociodemographic factors in the UKB cohort. Table S15. The potential functional annotations of the 49 candidate variants.
Additional file 3.
The codes used for statistical analysis and generation of tables and figures.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, H., Cai, Y., Jin, M. et al. Identification of specific susceptibility loci for the early-onset colorectal cancer. Genome Med 15, 13 (2023). https://doi.org/10.1186/s13073-023-01163-w
- Early-onset CRC
- Genetic variants