Integrative epigenomic and high-throughput functional enhancer profiling reveals determinants of enhancer heterogeneity in gastric cancer
Genome Medicine volume 13, Article number: 158 (2021)
Enhancers are distal cis-regulatory elements required for cell-specific gene expression and cell fate determination. In cancer, enhancer variation has been proposed as a major cause of inter-patient heterogeneity—however, most predicted enhancer regions remain to be functionally tested.
We analyzed 132 epigenomic histone modification profiles of 18 primary gastric cancer (GC) samples, 18 normal gastric tissues, and 28 GC cell lines using Nano-ChIP-seq technology. We applied Capture-based Self-Transcribing Active Regulatory Region sequencing (CapSTARR-seq) to assess functional enhancer activity. An Activity-by-contact (ABC) model was employed to explore the effects of histone acetylation and CapSTARR-seq levels on enhancer-promoter interactions.
We report a comprehensive catalog of 75,730 recurrent predicted enhancers, the majority of which are GC-associated in vivo (> 50,000) and associated with lower somatic mutation rates inferred by whole-genome sequencing. Applying CapSTARR-seq to the enhancer catalog, we observed significant correlations between CapSTARR-seq functional activity and H3K27ac/H3K4me1 levels. Super-enhancer regions exhibited increased CapSTARR-seq signals compared to regular enhancers, even when decoupled from native chromatin contexture. We show that combining histone modification and CapSTARR-seq functional enhancer data improves the prediction of enhancer-promoter interactions and pinpointing of germline single nucleotide polymorphisms (SNPs), somatic copy number alterations (SCNAs), and trans-acting TFs involved in GC expression. We identified cancer-relevant genes (ING1, ARL4C) whose expression between patients is influenced by enhancer differences in genomic copy number and germline SNPs, and HNF4α as a master trans-acting factor associated with GC enhancer heterogeneity.
Our results indicate that combining histone modification and functional assay data may provide a more accurate metric to assess enhancer activity than either platform individually, providing insights into the relative contribution of genetic (cis) and regulatory (trans) mechanisms to GC enhancer functional heterogeneity.
Enhancers are a specific class of cis-regulatory elements involved in regulating cell-specific gene expression [1, 2]. Enhancer dysregulation has been reported to contribute to multiple human diseases, including complex conditions such as cancer, Alzheimer’s disease, and diabetes [3, 4]. Specific to cancer, recent studies have highlighted enhancer dysregulation as a pervasive feature of malignancy [5,6,7], and the functional impact of enhancer heterogeneity, both between and within tumors, in the establishment and maintenance of tumor phenotypes, cancer prognosis, and treatment response [6, 8, 9]. For example, transcriptome-defined subtypes of medulloblastoma have distinct enhancer landscapes and cellular origins . Moreover, genomic variants including single nucleotide polymorphisms (SNPs), somatic mutations, chromosomal rearrangements, and somatic copy number alterations (SCNAs) can drive enhancer-linked phenotypic heterogeneity [10,11,12,13,14]. Examples include focal amplifications of super-enhancers near the MYC locus in 17% of lung adenocarcinomas , and elevated FOXA1 genomic occupancy at specific enhancers driving metastases in subgroups of pancreatic cancer . In ovarian cancer, the rs7874043 germline SNP within a PSIP1-linked enhancer affects Sp1 binding, and the C allele is associated with poor progression-free survival .
Super-enhancers comprise a cluster of putative enhancers occurring in close genomic proximity, typically characterized by high levels of transcription factor (TF), Mediator (MED) binding, and active chromatin marks such as histone H3 lysine 27 acetylation (H3K27ac) [15, 16]. Compared to typical enhancers, super-enhancers have been reported to be more significantly associated with tumor-specific gene expression, cancer hallmarks, and disease-associated genetic variation [15, 17, 18]. However, conflicting data persists in the field as to whether super-enhancers truly exhibit distinct functional characteristics and properties compared to regular enhancers, or whether they are simply assemblies of regular enhancers [19, 20]. One possible reason for this controversy is that the field’s ability to accurately identify functional enhancers and to assess their contribution to target gene expression (ideally in a quantitative manner) remains challenging. While H3K27ac enrichment has been widely utilized as a surrogate of enhancer activity, predicting enhancers based on histone ChIP data can result in both false-positive and false-negative findings , and assigning specific target gene(s) to distal enhancers remains imperfect . Traditionally, reporter assays such as luciferase assays have been used to directly quantify enhancer strength ; however, there are currently very few data sets in the public domain where such reporter measurements are available in a high-throughput scale. More recently, highly parallelized enhancer functional assays, such as STARR-seq and CapSTARR-seq, have been described as high-throughput and quantitative approaches to assess enhancer activity in mice, humans, and cancer cells, facilitating the identification of enhancers and enhancer-gene relationships [24,25,26]. In CapSTARR-seq , DNA fragments bearing candidate enhancer elements are captured using custom-designed probes and cloned by homologous recombination into mammalian vectors downstream of a basal promoter. The CapSTARR-seq library is then transfected en masse into cell lines, and RNA-seq is performed to detect mRNA reads corresponding to the enhancer sequence.
Gastric cancer (GC) is one of the most common cancers worldwide and a leading cause of global cancer-associated mortality, showing high incidence in East Asia, East Europe, Central and South America . Individual gastric tumors are highly heterogeneous with different subtypes and distinct molecular characteristics, clinical outcomes, and responses to therapy . Recent studies from our group and others have highlighted a role for epigenetic alterations and enhancer heterogeneity in GC . Here, by integrating enhancer information from both CapSTARR-seq and histone-ChIPseq, we explored the mechanistic basis of enhancer heterogeneity in GC. Combining CapSTARR-seq and histone profiling data, we identified germline SNPs, SCNAs, and trans-acting TFs driving enhancer heterogeneity between different GC patients. We also provide evidence that combining histone modification and CapSTARR-seq functional data may provide a more accurate metric to assess enhancer activity than either platform individually.
Cell lines and primary tumor samples
Primary patient samples were obtained from the SingHealth Tissue Repository with approvals from institutional research ethics review committees and signed patient informed consent. ‘Normal’ (that is non-malignant) samples used in this study refer to samples harvested from the stomach, from sites distant from the tumor, and exhibiting no visible evidence of tumor or intestinal metaplasia/dysplasia upon surgical assessment. Tumor samples were confirmed by cryosectioning to contain > 40% tumor cells. AGS, Hs738, Hs746T, HsInt, and SNU16 cells were obtained from the American Type Culture Collection. FU97, IM95, IST1, KATOIII, MKN7, NUGC4, OCUM1, RERF-GC-1B, and SCH cell lines were obtained from the Japan Health Science Research Resource Bank. CLS145 and HGC27 cells were obtained from the Cell Lines Service. NCC19, NCC24, NCC59, SNU1750, SNU1967, SNU484, and SNU719 cell lines were obtained from the Korean Cell Line Bank. LMSU cells were obtained from the Riken Gene Bank. HFE145 cells were a gift from Dr. Hassan Ashktorab (Howard University, Washington, DC). GES1 cells were a gift from Dr. Alfred Cheng, Chinese University of Hong Kong. YCC3, YCC7, YCC10, YCC11, YCC21, and YCC22 were gifts from Yonsei Cancer Centre (Seoul, South Korea). MycoAlert Mycoplasma Detection Kits (Lonza) and MycoSensor qPCR Assay Kits (Agilent Technologies) were used to detect mycoplasma contamination. All cell lines were negative for mycoplasma contamination.
Patient cohorts used in this study
All patient cohorts and respective datasets used in this study have been previously published. H3K27ac ChIP-seq data of 18 treatment-naïve primary GCs and matched normal gastric tissues from the SingHealth tissue repository (Singapore) were downloaded from GSE51776 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51776) , GSE76153 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76153)  and GSE75898 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75898) . Gene expression profiles of 200 primary GCs and 100 matched normal gastric tissue samples from National Cancer Centre, Singapore (Singapore cohort) were downloaded from GSE15459 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15459) , while genotype information of Singapore cohort GCs was downloaded from GSE31168 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31168) . RNA-seq profiles of 415 primary GCs and 35 normal gastric tissue samples from the TCGA cohort were obtained from the Broad Institute TCGA Genome Data Analysis Center (GDAC) Firehose (https://gdac.broadinstitute.org/). Gene expression profiles of 300 chemo-naïve primary GCs and 100 matched normal gastric tissue samples from Samsung Medical Centre, Seoul, Korea (ACRG cohort) were obtained from GSE62254 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254)  and GSE66222 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66222)  respectively.
Nano ChIP-seq and data analysis
Nano-ChIP-seq was performed as previously described . We assessed the qualities of ChIP-seq libraries (H3K27ac, H3K4me1 and H3K4me3, H3K27me3 and H3K36me3) in two steps. First, we performed quality control checks on the raw sequence reads generated from ChIP experiments through FastQC software (version 0.11.7). Second, we used two independent methods, CHANCE (CHip-seq ANalytics and Confidence Estimation) and ChIP enrichment assessment, to validate whether a library showed successful enrichment across the genome . If proven to be high-quality, sequence reads were mapped to the human genome (hg19) using Burrows-Wheeler Aligner  (BWA-MEM, version 0.7.0), after trimming the first ten and last ten bases. Reads with high mapping quality (MAPQ) ≥ 10 were retained for downstream analysis. To exclude PCR amplification biases, fragments that had the same starting and ending coordinates were removed by the “rmdup” algorithm of Samtools (version 0.1.19). Peaks with significant ChIP enrichment for H3K27ac, H3K4me3, and H3K4me1 relative to the input library (FDR< 0.05) were detected using CCAT (version 3.0). ChIP enrichments for H3K27me3 and H3K36me3 relative to input libraries were detected using macs2 (-q 0.05, --broad). Sequence coverage was computed using the R package “MEDIPS” (version 1.20.1) with a 50 bp bin size and read length extension to 200 bp. Peak density within a specific region was calculated by counting the total number of mapping reads normalized by the library size and the region length, equivalent to the RPKM metric. To account for background noise, background-corrected read densities were computed by subtracting corresponding input signals from the ChIP signal.
MeDIP-sequencing and data analysis
In brief, DNA was sonicated using COVARIS S2 and peak fragment distribution between 100 and 500 bp was verified on an Agilent Bioanalyzer (Agilent Technologies) using the DNA1000 chip. Fragmented DNA was end-repaired, dA-tailed and adapter ligated using NEBNext® DNA Library Prep Master Mix Set for Illumina (E6040). Samples were then spiked with control DNAs that were unmethylated, methylated, and hydroxymethylated (Diagenode C02040010) as a quality control measure. For each sample, input DNA that was not exposed to the primary antibody was included. Adapter-ligated DNA was subjected to immunoprecipitation with a primary monoclonal antibody against 5-methyl cytosine (Diagenode C15200081) as previously described . Real-time PCR using primers against spiked DNA controls were performed to verify successful and specific enrichment of methylated DNA. Immunoprecipitated samples were amplified using Phusion® High-Fidelity DNA Polymerase (M0530) and NEBNext® Multiplex Oligos for Illumina® (E7335) for 10 cycles. Amplified libraries were run on the Agilent Bioanalyzer using the High sensitivity DNA kit prior to Illumina sequencing using a single-end 100 base pair configuration. Reads were mapped to the human genome (hg19) using Burrows-Wheeler Aligner (BWA-MEM, version 0.7.0), after trimming the first ten and last ten bases. Reads with high mapping quality (MAPQ) ≥ 10 were retained for downstream analysis. To exclude PCR amplification biases, fragments that had the same starting and ending coordinates were removed by the “rmdup” algorithm of Samtools (version 0.1.19).
Identification of predicted enhancers/super-enhancers
We detected genomic regions enriched for H3K27ac ChIP signals, previously shown to mark active cis-regulatory elements. To exclude potential promotor predictions, we selected regions distant from known annotated TSSs (> 2.5 kb, GENCODE v19) as predicted enhancer elements. For each cell line, average profiles of histone modifications (H3K27ac, H3K4me1, and H3K4me3) and DNA methylation (5mC) at predicted enhancers and active annotated promoters (500 bp flanking annotated TSSs) were plotted using the R package “ngsplot.” Here, promoter regions overlapping H3K27ac peaks were defined as active promoters. Predicted enhancer regions with at least one base overlap across multiple cell lines were merged using Bedtools (version 2.25.0) to form a consistent coordinate reference. Predicted enhancers were further subdivided into predicted super-enhancers or typical enhancers using the ROSE algorithm with default parameters.
In vivo validation of GC enhancer catalog
“Recurrent” predicted enhancers were identified as those enhancers occurring in at least two GC cell lines. For t-Distributed Stochastic Neighbor Embedding (t-SNE), we used signals from cell-line-predicted recurrent enhancers showing H3K27ac enrichment in two or more patients. Read densities over cell-line-predicted recurrent enhancers across GC samples and matched normal tissues were corrected for potential batch effects using ComBat. t-SNE analysis was then performed using the R package “Rtsne” and plotted using R. WGS data from 212 GC samples was processed as described in the previously published paper . The somatic point mutation rate over a set of regions was calculated as the total number of somatic point mutation calls divided by the sum of region lengths. Considering the variation in background mutation rates of GCs, the relative somatic point mutation rate was calculated as the log2 fold change of the somatic point mutation rate over the background point mutation rate.
CapSTARR-seq and data analysis
CapSTARR-seq was performed as previously described . Paired-end sequencing reads were aligned to human genome hg19 by BWA-mem. Uniquely mapped fragments with MAPQ ≥ 10 were collected for further analysis. To exclude PCR amplification biases, fragments that had the same starting and ending coordinates were removed by the Samtools “rmdup” algorithm. To generate an overview of CapSTARR-seq signals, we created circular visualizations of CapSTARR-seq signals over regions within 2000 bases flanking probes across the human genome using the R package “RCircos” (version 1.2.1). To identify potential functional enhancers, we used macs2 (default setting; q value < 0.05; keep-dup all; BAMPE mode; nomodel mode) to call peaks from all reads combined from three biological replicates for each cell line, OCUM1 or SNU16. We filtered out enhancer peaks with summits not overlapping CarSTARR-seq probes or the corresponding predictive enhancer regions called by CCAT (using H3K27ac signals). The retained enhancers were defined as active functional enhancers and subsequently subdivided into four categories: inactive (fold change < 1.5), weak (1.5 ≤ FC < 2), moderate (2 ≤ FC < 3), and strong (FC ≥ 3). Average profiles of histone modifications (H3K27ac, H3K4me1, and H3K4me3) and DNA methylation (5mC) over strong, moderate, or weak enhancers were generated by extracting background-corrected ChIP-seq signal from wiggle files around the summits of these selected enhancers. We calculated the mean RPKM values for the epigenetic marks in each 50 bp non-overlapping window spanning 6 kb around the summits of enhancers.
We used Mann–Whitney U tests to compare CapSTARR-seq signals over predicted enhancers/super-enhancers defined using H3K27ac signals. The CapSTARR-seq density within an enhancer was computed using bigWigAverageOverBed (version 2). We calculated the number of TF binding sites over an enhancer using the ReMap database. To remove the potential confounding effect of DNA accessibility, DNA copy number, and region length of predicted enhancers on CapSTARR-seq differentiation, we regressed out the effects of those three factors from CapSTARR-seq signals using a generalized linear model (GLM). DNA accessibility and DNA copy number levels of an enhancer were estimated using ATAC-seq data and WGS data, respectively. We compared the corrected CapSTARR-seq signals and TF binding enrichments over predicted enhancers/super-enhancers using Mann–Whitney U test.
ATAC-seq and data analysis
ATAC-seq was performed as previously described . Sequence reads were mapped to the human genome assembly (hg38) with Bowtie2. We used LiftOver to convert genome coordinates for human build GRC38 (hg38) to GRCh37 (hg19). Mitochondrial and viral reads were filtered out. We called ATACseq peaks using MACS2 (callpeak function with parameters –nomodel and -B), and generated 300 bp bins flanking each summit. We filtered promoter peaks (< 2 kb from TSS) and non-cell-specific peaks open in over 30% of ENCODE cell lines [41,42,43].
Learning of ChromHMM states
A 15-state ChromHMM  model was learned by virtually concatenating consolidated data corresponding to a set of 5 chromatin marks assayed in OCUM1 and SNU16 cells (H3K4me3, H3K27ac, H3K4me1, H3K27me3, and H3K36me3). We decided to use a 15-state mode since it provided sufficient resolution to resolve biologically meaningful chromatin patterns. For each state, enrichments for different annotations were computed at 200-bp resolution. We used genomic annotations obtained through the UCSC Genome Browser for RefSeq transcription start site (TSS), transcript end site (TES), gene, exon, and regions within 2 kb of the TSS, CpG islands and nuclear lamina associated domains  were obtained through the UCSC Genome Browser.
ABC model to infer enhancer-promoter interactions
We selected all predicted enhancers captured by CapSTARR-seq probes as candidate regulatory elements in OCUM1 and SNU16 cells. Enhancer activity of candidate elements was assessed by using a combination of quantitative H3K27ac ChIP-seq and CapSTARR-seq signals. H3K27ac levels and enhancer strengths assessed by functional enhancer assay are commonly used to predict enhancers and are predictive of the expression of nearby genes. Contact for each enhancer-gene pair in the ABC model was estimated using a function of the genomic distance between the enhancer and the TSS of the gene (Contact = Distance−1). 430 highly differentially expressed genes between OCUM1 and SNU16 cells were identified using a threshold method (Gene expression difference > 200 TPM). For each gene, all captured H3K27ac-defined enhancers within 5 Mb of the gene’s promoter were included as candidate functional enhancers. The total effect of all functional enhancers on target gene expression was assessed as the sum of ABC scores of those enhancers. We validated the ABC model in five pairs of GC lines (AGS/LMSU, CLS145/YCC10, IM95/YCC3, MKN7/YCC7, and YCC21/KATOIII).
GRO-cap and HiDRA data analysis
Mapping of transcription start sites (TSSs) captured by GRO-cap in human lymphoblastoid B cell (GM12878) and chronic myelogenous leukemia (K562) ENCODE Tier 1 cell lines was collected . H3K27ac enriched regions and read density profiles in GM12878 and K562 cells are available through the ENCODE data portal (www.encodeproject.org) under accession nos. ENCSR000AKC (https://www.encodeproject.org/experiments/ENCSR000AKC/) , and ENCSR000AKP (https://www.encodeproject.org/experiments/ENCSR000AKP/) . To assess the significance of enhancer-promoter interactions in the genomic regions containing GRO-cap TSS, we randomly permuted positions of sequences within H3K27ac enriched regions distal to annotated promoters using shuffleBed with the -incl flag. To assess the significance of enrichment, we performed 10,000 shuffles of sequences. An empirical P value was then derived by counting the number of times where the number of randomly shuffled sequences overlapping the enhancer regions interacting with promoters exceeded the observed number of genomic regions containing GRO-cap TSS overlapping the enhancer regions interacting with promoters. Raw sequencing files of the HiDRA dataset were obtained from the Sequence Read Archive (accession no. SRP118092 (https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP118092) ) and processed as described by Wang et al . By analyzing this dataset, we identified ~ 47,000 promoter-distal genomic regions that are enriched for HiDRA signals which was referred to as “STARR active enhancers.” STARR-seq active regions and read density profiles in K562 cells were collected .
RNA-seq was performed as previously described . RNA-seq reads were mapped to the human reference genome (hg19) using STAR (version 2.6.1a). The per-base sequencing quality and the per sequence quality scores of the mapped reads were assessed using FastQC. Transcript abundances at the gene level were calculated in the TPM metric using RSEM (version 1.2.31).
Identification of differential enhancers
CapSTARR-seq RPKM values were filtered by predetermined log fold changes (> 1.5) and absolute difference (> 5 RPKM) between OCUM1 and SNU16 cells. The same filtering was also performed on the H3K27ac ChIP data. As we have demonstrated that combining H3K27ac-enrichment and CapSTARR-seq signal to estimate enhancer activity outperforms the use of either of these features, 1,888 differential enhancers were identified from the union of regions obtained for CapSTARR and H3K27ac analyses.
Identification of SNPs and CNAs
Sequencing reads from WGS libraries of GC cell lines were mapped to the human reference genome (hg19) using BWA. Duplicated reads marked by Picard (version 1.9.2) were removed. Indel regions were realigned by using GATK . Base quality score recalibration was conducted by GATK. SNPs were called using GATK Haplotype Caller. Copy number changes are generated by using CNVkit with the default parameters (bcbio-nextgen v0.9.3). As GC cell lines have no matched germline samples, CNVs were called against a non-matched normal sample. CNA breakpoints are defined as the ends of non-diploid segments. As the purity of cell lines is 100%, the DNA copy number of a segment is equal to 2^log2(tumor coverage/normal coverage). The copy number of an enhancer is defined as the copy number of the segment at which it is located.
CNA-associated differential enhancer analysis
To detect differential enhancers associated with CNA, 996 differential enhancers with DNA copy numbers called as abnormal in either SNU16 or OCUM1 were obtained (referred to as copy-number-abnormal enhancers thereafter). For each copy-number-abnormal enhancer, we employed the GLM model to test whether there is a significant correlation between H3K27ac signals and DNA copy numbers across 28 cell lines. At FDR threshold of 10%, 58 copy-number-abnormal enhancers exhibiting a significant correlation between H3K27ac signals and DNA copy numbers were defined as copy-number-associated. For each CNA-associated enhancer, we used the Pearson correlation test in the R package “stats” to estimate the significance of the correlation between expression levels of target genes and copy numbers across multiple cell lines. P values of the Pearson correlation tests were adjusted using the Benjamini-Hochberg method. Level 3 TCGA genome segment tables inferred from Affymetrix SNP 6.0 array data for 415 GC and 35 normal gastric samples were downloaded from the Broad Institute TCGA Genome Data Analysis Center (GDAC) Firehose (http://firebrowse.org/?cohort=STAD).
SNP-associated differential enhancer analysis
To exclude potential confounding effects of CNAs on identifying differential enhancers caused by SNPs, we focused on 892 differential enhancers located in diploid regions in both OCUM1 and SNU16 cells (referred to as copy-number-normal enhancers thereafter). As differential enhancers were defined by comparing OCUM1 and SNU16 cells, only SNP calls covered by non-reference bases in either of the two cells were retained. Finally, a set of 605 SNPs within copy-number-normal enhancers were obtained. We then employed the genotype-independent signal correlation and imbalance (G-SCI) pipeline to detect SNPs whose genotype correlates with histone acetylation levels (HaQTLs). To reduce experimental and other unknown variations, H3K27ac signals over differential enhancers across 28 cell lines were quantile-normalized before HaQTLs calling. For each SNP, the raw P value obtained from the G-SCI test was adjusted using Benjamini-Hochberg method. At FDR threshold of 10%, 207 candidate HaQTLs were identified. To further explore how those HaQTLs cause histone acetylation alterations across cell lines, 207 HaQTLs were assessed for functional impact using RegulomeDB. 43 HaQTLs predicted by RegulomeDB  to influence protein DNA binding (RegulomeDB score 1 or 2) were retained for subsequent genotype-expression correlation analysis.
Gene expression profiles of 200 GC and 100 matched normal gastric samples from Singapore were generated using Affymetrix Human Genome U133 Plus 2.0 Array (GSE15459 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15459) ) and processed as described previously . Genotype information of Singapore GC patients was extracted from the.CEL files produced from Affymetrix Genome-Wide Human SNP 6.0 Arrays (GSE31168 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31168) ) using the R package “crlmm.” Outliers classified as gene expression values that fall more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile. Outliers were excluded from each genotyping group to increase statistical power. P values were calculated using Student’s t-test.
ARL4C shRNA knockdown and ING1 overexpression
For ARL4C knockdown experiments, shRNA sequences targeting ARL4C were cloned into pLKO.1 lentiviral plasmid vector and transfected into HEK293T for virus generation.
shRNA sequences used were:
shARL4C #1: CCGGGGAGCTGCGAAGTCTGATTTACTCGAGTAAATCAGACTTCGCAGCTCCTTTTTG
shARL4C #2: CCGGCGAGGGCATGGACAAGCTCTACTCGAGTAGAGCTTGTCCATGCCCTCGTTTTTG
LMSU cells were infected with lentiviral particles and selected with 2 μg/ml puromycin for at least 3 days before western blot validation and cell proliferation assays.
For ING1 overexpression, the following primers were used to clone the ING1 cDNA:
ING1 cDNA was cloned into a pHR‟CMVGFPIRESWSIn18 based vector (gift from Dr. Shang Li, Duke-NUS) using Gibson Assembly and transfected into HEK293T for virus generation. LMSU cells were infected with lentiviral particles and selected with 400 μg/ml hygromycin for at least 3 days before western blot validation and cell proliferation assays.
Cells were harvested and lysed in RIPA buffer (Sigma) and incubated on ice for 10 min. Lysates were cleared by centrifuging at 9000 rpm for 10 min. Protein concentrations were measured using the Pierce BCA protein assay kit (Thermo Scientific). Samples were diluted in 4X Laemmli Buffer (Biorad), boiled at 95 °C for 10 min, and loaded for SDS-PAGE. The following antibodies were used: ARL4C (#10202-1-AP, Proteintech), ING1 (#14625, Cell Signalling Technology) and GAPDH (#60004-1-Ig, Proteintech).
Cell proliferation assays
Cell proliferation rates were measured using the Cell Counting Kit-8 (Dojindo). Briefly, 1000-3000 cells were seeded in 96-well plates and cell density was measured every 2 days following the manufacturer’s protocol. P values were derived using Student’s t test with P values < 0.05 considered as statistically significant.
Three hundred GC patients in the ACRG cohort were collected to study the association between expression levels of selected genes and GC survival. We employed Kaplan–Meier survival analysis with the overall survival as the outcome metric. Log-rank tests were used to estimate the significance of Kaplan–Meier curves. We established multivariate Cox proportional-hazards models using the R package “survminer.” Gene expression of the ACRG cohort was profiled using Affymetrix Human Genome U133plus 2.0 Array (GSE62254 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254)  and GSE66222 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66222) ) and processed as described previously .
TF-associated differential enhancer analysis
We interrogated enrichments of TFs in OCUM1-specific enhancers and SNU16-specific enhancers using HOMER with default parameters. The top 5 TFs identified from the HOMER outputs were used for differential expression analysis. Level 3 TCGA RNA-seq normalized matrix for 415 GC and 35 normal gastric samples was downloaded from the Broad Institute TCGA Genome Data Analysis Center (GDAC) Firehose (https://gdac.broadinstitute.org/). HNF4α-associated differential enhancers were defined as SNU16-specific enhancers exhibiting a significant correlation between HNF4α binding enrichment and H3K27ac signals among multiple lines. HNF4α ChIP data of GC cell lines and RNA-seq data of HNF4α perturbation were downloaded from GSE114018 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114018)  and were processed as previously described .
Predicted enhancer landscapes in GC cell lines reflect regulatory function in vivo
We analyzed 132 chromatin profiles covering multiple histone H3 modifications (H3K27ac, H3K4me1, H3K4me3, H3K27me3, and H3K36me3) across 28 GC cell lines, 4 gastric normal cell lines, 18 primary GCs and 18 matched normal gastric tissues. Of these profiles, approximately one-third (48 profiles) were newly generated for this study, and have not been previously reported. Tables S1 and S2 (Additional file 1) provide clinical information and sequencing statistics. Several of the profiles were generated using Nano-ChIP-seq, previously shown to exhibit good concordance with both conventional ChIP-seq and also orthogonal ChIP-qPCR results [29, 55]. We performed two independent quality control assessments of the Nano-ChIP-seq data: ChIP-seq enrichment analysis over known promoters, and assessment by the quality control software CHANCE (CHip-seq ANalytics and Confidence Estimation) . Comparison of input-corrected ChIP-seq and input signals at 1000 promoters associated with highly expressed protein-coding genes demonstrated successful enrichment in the vast majority (99%) of H3K27ac and (100%) H3K4me3 libraries (Additional file 1: Table S2). CHANCE analysis also revealed that the majority (75%) of ChIP libraries exhibited successful enrichment, including H3K4me1. We also calculated FRiP (Fraction of Reads in Peaks) scores for all ChIP-seq samples. We found that 80% of ChIP-seq libraries have FRiP scores ≥ 0.1, indicating that the high majority of our Nano-ChIPseq libraries are of acceptable quality (Additional file 1: Table S2). Besides Nano-ChIP-seq, the samples were also characterized by DNA methylation profiling (MeDIP-seq, see Methods), ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing), whole-genome sequencing (WGS), and Illumina RNA-sequencing. Figure 1 provides a graphical summary of the analytical workflow performed in this study.
To map candidate cis-regulatory elements on a genome-wide scale, we identified genomic regions of H3K27ac enrichment, previously shown to be associated with active promoters and enhancers  (Fig. 2a). We initially focused on the GC cell lines, to identify enhancers active in epithelial cancer cells and to avoid potential stromal contamination. Retaining H3K27ac ChIP libraries with > 13,000 CCAT-detected peaks, we selected peaks distal from annotated TSS sites (> 2.5 k), excluding four GC lines (NCC24, NUCG4, RERF-GC-1B, and YCC22). Using this approach, we identified 5,451–111,571 predicted distal enhancers per GC cell line. The wide range of predicted enhancers across GC lines is unlikely to be caused by technical variation, as they are highly reproducible across biological replicates (R = 0.86–0.89, Fig. 2b and Additional file 2: Fig. S1). Moreover, all 24 GC lines showed successful H3K27ac enrichment in the TSS flanking regions of 3,504 housekeeping genes identified from previously published work , demonstrating that our methodology can robustly detect H3K27ac enriched regions across the whole genome (Additional file 2: Fig. S2), and supporting the notion that the variation between lines is not due to the technical inability to detect H3K27ac signals. Our results are similar to previous studies of luminal breast cancer, where the number of predicted enhancers across biological samples ranges from 2,383 to 172,092 .
Concordant with previous studies [25, 58, 60], the predicted enhancer regions manifested enriched bimodal H3K27ac signals, H3K4me3 signal depletion, H3K4me1 signal enrichment, and were distinct from promoter-associated regions that are H3K4me3/H3K4me1 positive (Fig. 2c). Both predicted enhancers and active promoters were depleted for 5-methylcytosine (5mC), a marker of gene repression and inactivation (Fig. 2c). We detected 183,788 predicted distal enhancer regions in total, where some enhancers occurred in multiple lines (“recurrent”; present in 2 or more lines) while other regions existed in only one line (“private”). The frequency of recurrent predicted enhancers approached saturation at about 18 GC lines, larger than the number of GC lines analyzed in previous studies (n = 11; Fig. 2d) . Increasing the recurrence threshold from 2 to 3 did not appreciably alter the number of cell lines required for saturation (Additional file 2: Fig. S3). We thus defined a comprehensive GC enhancer catalog of ~ 75 K enhancers present in 2 or more lines (n = 75,730).
Using the ROSE algorithm , we also identified super-enhancers from the GC enhancer catalog, corresponding to enhancer subgroups spanning large genomic regions and associated with high levels of TF binding and H3K27ac histone marks. We identified 483 to 2,089 predicted super-enhancers per GC line and a consensus set of 8,293 super-enhancers across the lines. Supporting previous findings that super-enhancers play important roles in cell identity and cancer hallmarks, we observed super-enhancers associated with several known GC oncogenes, such as MYC, KLF5, and EGFR (Additional file 2: Fig. S4).
To determine which cell-line-predicted enhancers are also present in vivo, we compared H3K27ac enrichment levels for these regions across 18 primary GCs. Of 75,730 cell-line-predicted enhancers, around two-thirds (n = 52,457) were also present in two and more primary GC samples (Fig. 2e). Of 52,457 cell-line-predicted enhancers present in primary tumors, around two-thirds (n= 34,438; 66%) exhibited tumor-associated gain in two or more primary GCs (> 2-fold enrichment in tumor, minimum 0.5 RPKM difference, Additional file 2: Fig. S5). Supporting their association with cancer malignancy, t-Distributed Stochastic Neighbor Embedding (t-SNE ) using the predicted enhancers confirmed separations between GCs and matched normal tissues (Fig. 2f). In addition, integrating somatic mutation rates from previously published WGS data from 212 GCs confirmed that regions harboring predicted super-enhancers/enhancers were associated with significantly lower somatic mutation rates (P < 2 × 10−16, Student’s t test, Fig. 2g). As cis-regulatory elements are known to be more accessible to DNA repair complexes, this result further supports the in vivo validity of the GC enhancer catalog.
Functional enhancer profiling reveals intrinsic regulatory potential and elevated super-enhancer activity
While H3K27ac-enrichment is widely used as an epigenetic surrogate of enhancer activity, only a limited number of studies have experimentally investigated the extent to which H3K27ac-enriched regions exhibit bona fide transcriptional activity. To explore if enhancer regions predicted by H3K27ac can indeed function as true enhancers, we employed CapSTARR-seq technology to assess enhancer activity (Fig. 3a). Notably, as enhancer elements in CapSTARR-seq are cloned into exogenous plasmids, CapSTARR-seq transcriptional signals are likely to reflect intrinsic transcriptional activity of the enhancer, in the absence of other factors such as native chromatin patterns and topologically associating domains (TADs).
Informed by the GC enhancer catalog, we designed CapSTARR-seq probes targeting 78,974 regions (median size 5 kb). In brief, 57.6% of the CapSTARR-seq probes (n = 45,526) were located within super-enhancers; 16.3% within typical enhancers (n=12,863); and as negative controls we included 20,585 probes (n = 20,585; 26.1%) capturing genomic regions outside of predicted enhancer regions. The CapSTARR-seq library, covering ~ 100,000 genomic regions, was transfected into two GC lines (SNU16 and OCUM1), and RNA-seq was performed to provide a quantitative assessment of enhancer activity. We selected OCUM1 and SNU16 cells as cell line models for two reasons. First, OCUM1 and SNU16 cells were validated as two lines that closely resemble GC tumors by Celligner, a tool aligning tumor and cell line transcriptional profiles . Second, OCUM1 and SNU16 have been previously used as GC models in many other published studies, and are thus widely regarded as accepted GC models in the field [29, 64]. CapSTARR-seq enriched regions (CaPERs) were identified using MACS2 (q value < 0.05). We observed high correlations between CapSTARR-seq replicates performed in OCUM1 (Fig. 3b) and SNU16 cells, indicative of high reproducibility (Pearson R = 0.91–0.95 for OCUM1; Pearson R = 0.77–0.81 for SNU16; Additional file 2: Fig. S6). Figure 3c depicts CapSTARR-seq activities of enhancers captured across the whole human genome in OCUM1 or SNU16 cells.
We defined CapSTARR-seq functional enhancer activity as the CapSTARR-seq fold change (FC) over input signal and demarcated three groups based on FC values: weak (1.5 ≤ FC < 2), moderate (2 ≤ FC < 3), and strong (FC ≥ 3). We then proceeded to explore relationships between levels of epigenetic mark enrichment (e.g., H3K27ac) against CapSTARR-seq activity. As anticipated, enhancers exhibited significantly higher CapSTARR-seq signals compared to negative controls (P< 2 × 10−16, Mann–Whitney U test, Additional file 2: Fig. S7). When analyzed against H3K27ac, H3K4me1, H3K4me3, and DNA methylation, we found that strong enhancers displayed the highest H3K27ac and H3K4me1 signals while moderate enhancers were associated with intermediate levels (H3K27ac: P < 0.01; H3K4me1: P < 0.05; Additional file 2: Fig. S8). All three CapSTARR-seq categories were also depleted for 5mC (Additional file 2: Fig. S8).
To further investigate associations between chromatin marks and CapSTARR-seq activity, we applied a multivariate hidden Markov model (ChromHMM, see the “Methods” section) that utilized combinational patterns of five chromatin marks (H3K4me3, H3K27ac, H3K4me1, H3K27me3, and H3K36me3) to distinguish chromatin states. We defined 15 states showing distinct biological enrichments for transcriptional start sites (TSSs), transcriptional end sites (TESs), genes, exons, CpG islands, and inactive genomic regions associated with the nuclear lamina (Fig. 3d). We focused on six enhancer-associated chromatin states, which we refer to as strong enhancer 1-2, genic enhancer, weak enhancer 1-2, and bivalent enhancer (states 5-10) (Fig. 3d). We found that strong enhancers (state 8–9) are marked by H3K27ac and H3K4me1, whereas weak enhancers (state 6–7) are solely marked by H3K4me1. We next asked if different enhancer classes might exhibit distinct patterns of CapSTARR-seq activity. We observed that the CapSTARR-seq functional enhancer activity at strong enhancer elements is greater than that of weak enhancer elements, supporting consistent associations between CapSTARR-seq activity and chromatin state (Fig. 3e).
Although in aggregate enhancer regions exhibited significant consistency between CapSTARR-seq activity and H3K27ac signals, in absolute terms only 21–26% of CaPERs overlapped H3K27ac-enriched region in the lines tested (Fig. 3f). As expected, we observed significant correlations between H3K27ac signals and CapSTARR-seq signals over H3K27ac active CaPERs in both OCUM1 and SNU16 cells (OCUM1: Pearson R = 0.42, P < 0.001; SNU16: Pearson R = 0.36, P < 0.001; Additional file 2: Fig. S9). As a representative example, Fig. S10 (Additional file 2) depicts a H3K27ac inactive CaPER at the SNX30 genomic region in OCUM1 cells, exhibiting high CapSTARR-seq signals but no H3K27ac and ATAC-seq enrichment. We reasoned that these results might be due to, at least in part, by CapSTARR-seq measuring signals from cytoplasmic reporter constructs and thereby likely reflecting a region’s potential regulatory activity independent of native chromatin context. Three findings support this model. First, in SNU16 and OCUM1 cells, we found that regions with positive CapSTARR-seq signals but inactive by H3K27ac were associated with reduced H3K4me1 and ATAC-seq signals, implying that these are ‘dormant’ enhancers with lower accessibility to trans-acting factors (Fig. 3g). Second, among CaPERs, approximately four-fifths of enhancers “dormant” in either OCUM1 or SNU16 cells exhibited an “open” chromatin state (i.e., active ATAC-seq signals) in at least one of 17 GC lines (Fig. 3h). Third, we observed that the gene expression levels of genes near H3K27ac active CaPERs are higher than that near H3K27ac inactive CaPERs (OCUM1: P = 8.1 × 10−13; SNU16: P = 3.3 × 10−10; Mann–Whitney U test, Additional file 2: Fig. S11). These findings thus suggest that CapSTARR-seq and H3K27ac data are complementary, with the former reflecting regions with regulatory potential, while the latter reflects native chromatin accessibility.
We next asked if super-enhancers and typical enhancers might exhibit distinct patterns of CapSTARR-seq activity. We found that super-enhancer regions demonstrated statistically higher functional enhancer signals than typical enhancers (P < 2.22 × 10−16, Mann–Whitney U test, Fig. 3i). Using a generalized linear model (GLM), we confirmed that the increased activity levels of super-enhancers were independent of differences in DNA accessibility (assessed by ATAC-seq), DNA copy number levels, and genomic length (Additional file 2: Fig. S12). When analyzed at the level of DNA sequence and comparing TF binding sites over each predicted enhancer (using the Remap database ), we discovered that TF binding occupancy was associated with increasing functional enhancer strength (Fig. 3j). These results demonstrate that even when decoupled from their original chromatin context, regions associated with super-enhancers still exhibit higher functional enhancer activity than typical enhancers, and this increased activity is likely due to an abundance of dense TF cis-binding occupancy at super-enhancer regions.
Combining CapSTARR-seq enhancer activity with H3K27ac profiling may improve prediction of enhancer-promoter connections
Due to their localization at distal genomic regions, accurately assigning specific genes to enhancers (“enhancer-gene connections”) remains challenging [66, 67]. Previous studies have used H3K27ac-enrichment as a surrogate of enhancer activity to predict enhancer-promoter connections [68, 69]. To explore if CapSTARR-seq functional enhancer activity might enhance the prediction of enhancer-promoter interactions, either alone or combined with H3K27ac, we employed the recently-published Activity-by-contact (ABC) model , previously developed to predict functional enhancer–gene connections. Briefly, in the ABC model, the quantitative effect of a regulatory element on the expression of a target gene depends on its own strength as an enhancer (Activity) weighted by the frequency of its 3D contact with the gene’s promoter (Contact). The relative contribution (ABC score) of an enhancer to one gene’s expression is calculated by that enhancer’s effect divided by the total effect of all enhancers associated with the gene (Fig. 4a).
We first examined if differential enhancer landscapes based on differences in H3K27ac ChIP-seq signals can explain (at least in part) gene expression differences between OCUM1 and SNU16 cells using the ABC model. Operationally, we estimated Activity as the level of H3K27ac-enrichment at an enhancer region, and Contact as a function of the genomic distance between the enhancer and the TSS of the gene (Contact = Distance−1) (Fig. 4b). We selected 430 highly differentially expressed genes between OCUM1 and SNU16 cells. For each gene, we included all H3K27ac-predicted enhancers within 5 Mb of the gene’s promoter as candidate functional enhancers. The total effect of all functional enhancers on gene expression was estimated as the sum of ABC scores of those enhancers. We found that gene expression differences between OCUM1 and SNU16 were significantly correlated with ABC score differences (Pearson R = 0.33, P = 1.44 × 10−12, Fig. 4c). Extending beyond OCUM1 and SNU16, we further confirmed that gene expression differences between 5 other pairs of GC cell lines were significantly correlated with ABC score differences in the similar ballpark range as OCUM1 and SNU16 (Pearson R = 0.26–0.36, P < 0.001, Additional file 2: Fig. S13). Notably, these correlation values are similar to independent work reporting a similar range of correlation between ABC scores and observed effects on gene expression .
We hypothesized that CapSTARR-seq might capture information relevant to enhancer activity beyond that provided by H3K27ac levels. Specifically, recent studies have reported that transcription can occur at distal enhancers and that transcribed enhancers can be identified using GRO-cap-based annotations of TSSs. To explore if transcribed enhancers provide a better reflection of active enhancers and inference of enhancer-promoter interactions compared to histone modifications , we analyzed publicly available transcribed enhancers (inferred from GRO-cap) interacting with promoters in GM12878 lymphoblastoid cells. Of 34,436 transcribed enhancers, we found 11,439 cases (33%) involved in enhancer-promotor interactions. These interactions were independently validated by promoter capture Hi-C (pcHi-C) data  and were higher than enhancer-promoter connections inferred using H3K27ac alone (~ 8693 (25%) expected by chance inside H3K27ac active enhancers, P< 0.0001 vs. random shuffling of sequences within H3K27ac active enhancers, Fig. 4d), indicating that transcribed enhancers (inferred from the GRO-cap assay) may be a more specific predictor of enhancer function than H3K27ac signals.
Having shown that transcribed enhancers can function as a more specific predictor of enhancer function than H3K27ac, we then asked if transcribed enhancers can be inferred from STARR-seq data. We reanalyzed “High-resolution Dissection of Regulatory Activity” (HiDRA) datasets , which test putative regulatory regions by coupling accessible chromatin extraction with self-transcribing episomal reporters (ATAC-STARR-seq). From 7,000,000 DNA fragments in GM12878 lymphoblastoid cells, we identified ~ 47,000 enhancer regions enriched for HiDRA signals referred to as “STARR active enhancers.” In parallel, by reanalyzing H3K27ac ChIP-seq profiles of the same cell line, we identified ~ 61,000 enhancer regions showing significant H3K27ac enrichment, referred to as “H3K27ac active enhancers.” Within the H3K27ac active enhancers, 44% of STARR active elements exhibited native enhancer transcription (as determined by GRO-cap TSS profiling). By contrast, only approximately 9% of STARR inactive elements within the H3K27ac active enhancers were GRO-Cap TSS positive (Additional file 2: Fig. S14; P < 2.2 × 10−16, two-sided Fisher’s exact test.). We validated these GRO-seq, STARR-seq, and H3K27ac relationships in a separate cell line (K562; Additional file 2: Fig. S14). These results suggest that STARR active enhancers are more likely to exhibit native enhancer transcription and involved in enhancer-promoter interactions.
Combining these results, we thus tested if incorporating CapSTARR-seq functional enhancer activity also enhances the performance of the ABC model in our GC data, estimating Activity as the geometric mean of H3K27ac-enrichment and CapSTARR-seq signals at an enhancer region, rather than H3K27ac alone. We observed a significant and improved correlation between gene expression difference and ABC score difference (Pearson R = 0.41, P < 2.2 × 10−16, Fig. 4e). The ABC model incorporating H3K27ac-enrichment and functional enhancer activity outperformed that using either of the estimators individually (H3K27ac only: Pearson R = 0.33, P = 0.05; CapSTARR-seq only: Pearson R = 0.20, P < 0.001, Dunn and Clark’s Z test ). For example, an ABC model based on H3K27ac estimated enhancer activity only connected three enhancers to the HSPB1 promoter in both OCUM1 and SNU16 cells (Fig. 4f). These three enhancer-promoter interactions were validated by the pcHi-C database. Although the model predicted that the sum of ABC scores (SNU16: 0.81; OCUM1: 0.89) of these three enhancers was larger in OCUM1 cells than SNU16 cells, HSPB1 exhibited higher expression in SNU16 cells compared to OCUM1 cells (SNU16: 930 TPM; OCUM1: 230 TPM). However, by adding Cap-STARRseq enhancer activity into the ABC model, the HSPB1 expression difference between OCUM1 and SNU16 cells became consistent with the ABC score difference. We reasoned that the activation of the enhancer e1, which exhibited CapSTARR-seq activity, may increase HSPB1 expression in SNU16 cells. By contrast, enhancer e1 exhibited no CapSTARR-seq activity in OCUM1 cells.
To further validate the improved correlation with gene expression differences when combining Cap-STARRseq with H3K27ac profiling, we extended our comparative analysis of the ABC model to two other cell lines (GM12878 and K562). Similar to our original observations, we observed a significant correlation between differences in gene expression between GM12878 and K562 and differences in ABC score calculated based on H3K27ac signals (Pearson R = 0.25, P = 2.0 × 10−8, Additional file 2: Fig. S15). Notably, we similarly observed an improved correlation between gene expression difference and ABC score difference when combining H3K27ac-enrichment and CapSTARR-seq signals to assess enhancer activity (Pearson R = 0.39, P < 2.2 × 10−16; P < 0.001, Dunn and Clark’s Z test, Additional file 2: Fig. S15), confirming our observations in OCUM1 and SNU16 cells.
Taken collectively, these results indicate that incorporating Cap-STARRseq enhancer activity may elevate the accuracy of estimating enhancer functionality and the ability of predicting enhancer-gene connections when coupled with H3K27ac ChIP-seq data. Moreover, our results indicate that one possible reason why combining CapSTARR-seq enhancer activity with H3K27ac profiling improved prediction of enhancer-promoter connections is that enhancer transcription, evidenced as a more specific predictor of enhancer function than H3K27ac signals, may be inferred from cytoplasmic CapSTARR-seq data.
Effects of large-scale copy number alterations on differential enhancer activity
We then combined the CapSTARR-seq and H3K27ac acetylation patterns (see the “Methods” section) to explore mechanisms associated with enhancer heterogeneity in GC. In total, we identified 1,888 differential enhancers between OCUM1 and SNU16 cells. We hypothesized that differential enhancers might result from at least two sources—cis-based genomic variation (CNVs and SNPs) and trans-based TF binding (Additional file 2: Fig. S16 provides a cartoon comparing two samples). In the “cis-model,” sample-specific SCNAs or SNPs result in differential enhancer activity, perhaps by influencing the recruitment of TFs commonly expressed in both samples. In the “trans-model,” differential expression of TFs between samples may underlie sample-specific alterations in enhancer activity.
To test the influence of SCNAs on differential enhancer activity, we applied CNVkit  on WGS data of OCUM1 and SNU16 to infer SCNAs in both lines. When summarized across all differential enhancers, we observed that H3K27ac and CapSTARR-seq signals were positively correlated with SCNAs (P < 4.6 × 10−13, Additional file 2: Fig. S17). Of 1,888 differential enhancers, approximately one quarter (513 enhancers, 27.2%) exhibited concordance between H3K27ac signals and DNA copy numbers in OUCM1 and SNU16 cells (Fig. 5a). To extend these results, we then queried 26 additional cell lines and identified 58 enhancers showing significant associations between H3K27ac and SCNA levels, of which 35 also showed a statistically significant correlation with target gene expression levels. These included known oncogenes such as FGFR2 and MYC (Additional file 1: Table S3). Notably, we observed a significant correlation between copy number deletions of an ING1-associated enhancer, decreased H3K27ac signals, and decreased ING1 expression in multiple lines (Fig. 5b). For all identified differential enhancers exhibiting a statistically significant correlation with the expression level of the target gene, the copy number of the enhancer region was the same as the copy number of the target gene in GC lines (reciprocally, we identified cases where the SCNA affects the gene body but not the enhancer). To determine if the altered ING1 expression is a result of altered copy number of the ING1 gene itself, or altered copy number of the ING1-associated enhancer only, we analyzed the copy number profiles at the ING1 locus for GC samples from the TCGA cohort. We applied a multiple linear regression model to analyze the relationship between ING1 expression level, gene-body copy number, and enhancer copy number. We found that both the copy number of the ING1 gene-body itself and the enhancer have significant effects on ING1 expression (gene-body copy number: P = 1.5 × 10−6; enhancer copy number: P = 7.1 × 10−4, Additional file 2: Fig. S18). Variance Inflation Factor (VIF) analysis of ING1 gene body and enhancer copy number in the regression model did not reveal major multicollinearity (VIF 4.1, below the threshold of 5). It has been reported that ING1 encodes a protein that physically interacts with the TP53 tumor suppressor and negatively regulates cell growth , and may function as a tumor suppressor gene in gastric cancer . Using overexpression vectors, we confirmed that overexpression of ING1 in GC cells significantly reduced cell proliferation (Additional file 2: Fig. S19), consistent with a purported anti-oncogenic function.
Effects of germline SNP variation on differential enhancer activity
Next, we proceeded to explore the role of germline SNP variation in differential enhancer activity. To exclude the confounding effects of SCNAs, we restricted our analysis to differential enhancers located in diploid regions in both lines (“Copy number-normal enhancers”). Within 892 copy number-normal enhancers, we identified 605 unique germline SNPs using Genome Analysis Toolkit (GATK) between OCUM1 and SNU16 cells. To further discover SNPs whose genotype correlates with histone acetylation levels, we applied the genotype-independent signal correlation and imbalance algorithm (G-SCI ) across a cohort of 28 cell lines. Specifically, we identified SNPs correlating with cohort-variation in H3K27ac-enrichment over the same regions, referred to as histone acetylation QTLs (haQTLs). We identified 207 haQTLs (34%) using the G-SCI test. Interestingly, this discovery rate is significantly higher than (8%) reported in previous studies  despite both studies using the same controlled false-discovery rate (FDR) of 0.1. It is likely that the higher haQTL discovery rate in our study might be due to the combination of CapSTARR-seq and H3K27ac ChIP-seq data, which allows finer-scale resolution of enhancer boundaries. Supporting this, we also applied G-SCI to differential enhancers only based on H3K27ac. In brief, of 8016 SNPs within H3K27ac-defined differential enhancers, 2194 HaQTLs (27%) were detected using the G-SCI test (FDR < 0.1). Two-sided proportion test gave us a significant difference when comparing the two ratios (P = 3.2 × 10−4).
Of the 207 haQTLs, we focused on 43 germline SNPs predicted by RegulomeDB , a database of human regulatory variants reported to influence protein DNA binding (RegulomeDB score 1 or 2, Additional file 1: Table S4). For instance, we observed five haQTLs (rs1464264, rs6053580, rs73931802, rs1029330, and rs10759773), in which histone acetylation levels over the enhancer region was associated with the haQTL genotype among multiple cell lines (Additional file 2: Fig. S20). Notably, haQTL rs1464264 is located at an OCUM1-specific enhancer (Fig. 5c), and histone acetylation levels over this enhancer region were associated with the rs1464264 genotype (Fig. 5d). Specifically, increased H3K27ac at enhancers covered by the major A allele coincided with increased expression of ARL4C (ADP-ribosylation factor-like 4C, a class of GTP-binding protein), suggesting that the A allele potentially upregulates the gene (Fig. 5e). Moreover, the enhancer region harboring rs1464264 exhibits long-range interactions with the ARL4C gene promoter region as evidenced by pcHi-C database (P < 10−3, Fig. 5d). The correlation of the A allele and increased ARL4C expression was also confirmed in an independent validation set of primary GC patients from Singapore (n = 161 patients, P = 0.029 between GG and AA, Fig. 5f) and another validation set of colon samples collected by the Genotype-Tissue Expression (GTEx) project  (n = 368 individuals, P < 0.05 between GG and AA, Additional file 2: Fig. S21). ARL4C was significantly upregulated in GC compared to normal samples in the ACRG cohort (P = 0.008; Additional file 2: Fig. S22). Survival analysis revealed patients with GCs exhibiting high expression of ARL4C showed poor overall survival compared with GC samples where ARL4C is relatively lowly expressed in the ACRG cohort (P = 0.038, log-rank test, Fig. 5g). In multivariate analyses involving Cox proportional-hazards models, the high expression of ARL4C (hazard ratio for death, 1.97; 95% CI, 1.2 to 3.2; P = 0.006, Additional file 2: Fig. S23) remained significantly associated with poor survival after adjustment for age and gender. For the TCGA cohort, we observed a nearly significant correlation between the high expression of ARL4C and poor survival in the Kaplan–Meier analysis (P = 0.072) and Cox regression analysis (P = 0.052, Additional file 2: Fig. S23). Moreover, it has been reported that ARL4C is a peritoneal dissemination-associated gene in GC . We confirmed that knockdown of ARL4C in GC cells using two independent shRNAs significantly reduced cell proliferation (Additional file 2: Fig. S24), consistent with a purported oncogenic function.
Effects of trans-acting TF binding on differential enhancer activity
Finally, we explored the role of trans-acting factors in differential enhancer activity. Using HOMER, a de novo motif discovery algorithm, we found that OCUM1-specific enhancers (n=1,014) were enriched in FRA1, FRA2, and ATF3, while SNU16-specific enhancers (n = 847) exhibited enrichments in FOXA1, HNF4α, and KLF5/6 binding (Fig. 6a, Additional file 2: Fig. S25). Although OCUM1-specific enhancers defined only based on H3K27ac were also enriched in FRA1, FRA2, and ATF3 binding, the significance of enrichment and the percentage of the target sequence with motif were lower, indicating that combination of CapSTARR-seq and H3K27ac ChIP-seq data may enable stronger enrichment of TFs. In brief, the average rate of the target sequence with FRA1, FRA2, and ATF3 motif on OCUM1-specific enhancers defined based on two signals was twice as much as that on enhancers defined only based on H3K27ac (32% vs 14%, Additional file 2: Fig. S25). Among these TFs binding on SNU16-specific enhancers, we focused on HNF4α, as HNF4α was the only TF exhibiting significantly upregulated expression in GCs as compared to normal tissues in two independent GC cohorts (Singapore cohort: P = 4.9 × 10−3; TCGA cohort: P = 1.8 × 10−4; Mann–Whitney U test, Fig. 6b).
To identify differential enhancers associated with HNF4α binding, we focused on SNU16-specific enhancers exhibiting (1) occupancy of HNF4α in SNU16 cells and (2) a significant correlation between HNF4α binding enrichment and H3K27ac signals among multiple lines. Of 64 enhancers out of 847 fulfilling these criteria, 22 were further highlighted as they exhibited a statistical correlation between H3K27ac signals and the expression levels of their corresponding target genes (Additional file 1: Table S5). For example, Fig. 6c highlights one enhancer near the GRHL2 locus, where GRHL2 expression levels are significantly correlated with H3K27ac signals over the enhancer region among multiple GC lines (Pearson R = 0.75, P = 8.9 × 10−6, Fig. 6d). This region was selected for two more reasons: (1) GRHL2 is significantly upregulated in GCs (Additional file 2: Fig. S26); (2) GRHL2 has recently been reported to be involved in Epithelial-mesenchymal transition in GC . To confirm that GRHL2 is a target gene of HNF4α, we performed HNF4α overexpression and knockdown experiments in GC lines. We observed concordant dysregulation of GRHL2 RNA levels upon HNF4α perturbation (Fig. 6c). To independently validate this relationship in vivo, we further confirmed a positive correlation of GRHL2 and HNF4α expression in both the Singapore cohort and the TCGA cohort (Singapore cohort: P = 2.2 × 10−16; TCGA cohort: P = 1.2 × 10−15; Pearson’s correlation test, Fig. 6e, f). Taken together, our results suggest that differential HNF4α binding enrichment at this enhancer is associated with differential H3K27ac ChIP-seq and CapSTARR-seq signals, and also GRHL2 expression.
In this study, we surveyed the GC enhancer landscape across 28 GC cell lines, 4 normal gastric cell lines, 18 primary GCs and 18 matched normal gastric tissues. To our knowledge, this epigenomic data set currently represents the largest such data set for GC, expanding previous studies by over one-third and conceptually extending previous findings by applying CapSTARR-seq to directly test > 70,000 candidate H3K27ac-positive enhancer elements. Previous studies using STARR-seq (a predecessor to CapSTARR-seq) in Drosophila have demonstrated that enhancer strengths correlate well with average gene expression , and for STARR-seq-high enhancers occurring nearby lowly expressed genes, these are explainable by the former not being accessible in their native chromatin context. In our study, we similarly identified a cadre of enhancer elements showing strong CapSTARR-seq signals but not exhibiting high H3K27ac levels in the same cell line. However, these regions exhibited H3K27ac signals when extrapolated to other cell lines, suggest that these regions are likely “dormant” enhancers that can activate gene expression in the context of a hospitable and open chromatin environment. Supporting this idea, others have reported that strong STARR-seq enhancers located at closed chromatin regions can drive nearby gene expression when they are opened by TSA, a histone deacetylase (HDAC) inhibitor .
The question of whether super-enhancers comprise a novel paradigm that is distinct from regular enhancers remains to be determined. By analyzing the α-globin super-enhancer, Hay et al. reported that each constituent enhancer “acts independently and in an additive fashion” . In contrast, Shin et al. reported that super-enhancers associated with the Wap gene operate across a temporal and functional hierarchy of constituent enhancers . In our study, we found that even when decoupled from their chromatin context, genomic regions associated with super-enhancers exhibit higher functional enhancer activities compared to regular enhancers. Our results suggest that this increased enhancer activity is likely due to an intrinsic abundance of dense TF cis-binding occupancy sites. Other studies have also reported that individual enhancers within super-enhancers can exhibit stronger activating features, such as H3K27ac-enrichment and TF binding densities [15,16,17]. Therapeutically, regions exhibiting dense TF binding occupancies have been shown to represent important regulatory nodes susceptible to changes in TF concentration . Such vulnerabilities to TF perturbation at super-enhancers may offer productive targets for cancer therapy. It will be interesting to explore this area further using other methods of functional enhancer testing, such as recently developed CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) platforms [84, 85].
Identifying enhancer-promoter interactions remains challenging because promoters and regulatory elements can be separated by millions of base pairs , and oftentimes the closest gene is usually not the true enhancer target [22, 70, 87]. Correlation-based approaches identify enhancer-promoter connections by combining genomic proximity with genetic associations (eg expression quantitative trait loci (eQTLs)) or chromatin state, enabling detection of long-range interactions [88, 89]. Here, we found that combining CapSTARR-seq enhancer activity with H3K27ac profiling improves the prediction of gene expression differences. Previous studies have shown that incorporating STARR-seq activity can enable fine-scale resolution of active enhancer boundaries derived from lower-resolution chromatin immunoprecipitation (ChIP) profiles [71, 90]. Supporting these results, we found that combining CapSTARR-seq enhancer activity with H3K27ac profiling gave us a catalog of enhancers with smaller region sizes compared to H3K27ac only (average region size: 1 vs 4 kb), with enhancers defined using two signals being more enriched in HaQTLs and TFs compared to those defined based on H3K27ac only. Taken collectively, our results suggest that the combination of CapSTARR and H3K27ac analyses might provide a more precise metric to describe enhancer functionality in vivo.
Genomic variants (including SNPs and SCNAs), and various TF binding enrichment at enhancer regions are two sources of enhancer heterogeneity. SNPs in enhancer regions may affect enhancer activity through several mechanisms, including alteration of TF binding behavior, chromatin accessibility change, and alteration of H3K27ac levels . Indeed, in our study, we observed a SNP (rs1464264) associated with H3K27ac levels of an enhancer and the expression of ARL4C. Specifically, there were significantly higher ARL4C expression levels among subjects with the rs1464264 AG genotype or the AA genotype compared with the GG genotype. However, there were no statistically significant differences of ARL4C expression between the AG and AA genotypes. Notably, this phenomenon is not unprecedented, as previous studies have reported that the homozygous state of a germline SNP may not certainly increase expression levels beyond the heterozygous state [92,93,94]. For example, the G allele of the Parkinson’s disease risk SNP rs356168 was associated with increased SNCA expression in Parkinson’s disease patients (P = 0.03 between GG and AA). However, there is no difference in SNCA expression levels between the AA and AG state . ARL4C has been reported to be a GC risk-associated gene identified in previous GWAS studies , and rs1464264 has been reported as an eQTL for ARL4C in Lymphoblastoid cells [96, 97]. This example deepens our knowledge of the functional impact of cancer risk-associated SNPs identified in GWAS studies.
Our study has limitations. First, our CapSTARR-seq data is based on only two cell lines. While our key findings were subsequently validated in multiple samples, we acknowledge that more CapSTARR-seq libraries will be required for providing a comprehensive overview of functional enhancer activity in GC. Second, we still lack a clear explanation of predicted enhancers that exhibit H3K27ac enrichment but no CapSTARR-seq activity. It is possible that the H3K27ac mark is not limited to enhancers, but may also mark other classes of distal regulatory elements targeted by CTCF [6, 17, 98]. We hypothesize that the CapSTARR-low/H3K27ac-high elements may represent insulators or other unknown regulators, which requires further study.
Our study demonstrates that cell-line-predicted enhancers are pervasively linked to epigenomic and regulatory circuitry in vivo, and reveals mechanisms involving somatic CNVs, germline SNP variation, and trans-acting transcription factors in driving enhancer heterogeneity. We also provide evidence that combining histone modification and functional assay data provides a more accurate metric to assess enhancer activity than either platform individually, and identified novel genes associated with GC. Taken collectively, these studies will likely deepen our knowledge of enhancer functions driving GC development and progression.
Availability of data and materials
All raw and processed sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE162420 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162420) . Previously deposited histone ChIP-seq (GSE51776 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51776) , GSE76153 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76153) , GSE75898 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75898)  and GSE121140 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121140) ) and RNA-seq (GSE85465 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85465) , GSE121140 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121140) ) data that are used in this study are available in Gene Expression Omnibus. CapSTARR-Seq and ATAC-seq data sets are available from Gene Expression Omnibus under GSE121140 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121140) . Other public data sets used are described in the Methods section.
Acetylation of lysine 27 on histone H3 protein subunit
Asian Cancer Research Group
Assay for transposase-accessible chromatin using sequencing
CapSTARR-seq enriched regions
Capture-based self-transcribing active regulatory region sequencing
Encyclopedia of DNA elements
Expression quantitative trait loci
Fraction of reads in peaks
Generalized linear model
Genome-wide association study
Genotype-independent signal correlation and imbalance
Grainyhead like transcription factor 2
Hepatocyte nuclear factor 4 – alpha
High-resolution dissection of regulatory activity
Histone acetylation QTLs
Inhibitor of growth family member 1
Monomethylation of lysine 4 on the histone H3 protein subunit
Promoter capture Hi-C
Reads per kilobase of transcript per million reads mapped
Single nucleotide polymorphisms
Somatic copy number alterations
t-Distributed stochastic neighbor embedding
The Cancer Genome Atlas
Transcripts per million
Tri-methylation of lysine 4 on histone H3 protein subunit
Ong CT, Corces VG. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet. 2011;12(4):283–93.
Zabidi MA, Arnold CD, Schernhuber K, Pagani M, Rath M, Frank O, et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature. 2015;518(7540):556–9.
Li P, Marshall L, Oh G, Jakubowski JL, Groot D, He Y, et al. Epigenetic dysregulation of enhancers in neurons is associated with Alzheimer's disease pathology and cognitive symptoms. Nat Commun. 2019;10(1):2246.
Corradin O, Scacheri PC. Enhancer variants: evaluating functions in common disease. Genome Med. 2014;6(10):85.
Lin CY, Erkek S, Tong Y, Yin L, Federation AJ, Zapatka M, et al. Active medulloblastoma enhancers reveal subgroup-specific cellular origins. Nature. 2016;530(7588):57–62.
Patten DK, Corleone G, Gyorffy B, Perone Y, Slaven N, Barozzi I, et al. Enhancer mapping uncovers phenotypic heterogeneity and evolution in patients with luminal breast cancer. Nat Med. 2018;24(9):1469–80.
Mack SC, Pajtler KW, Chavez L, Okonechnikov K, Bertrand KC, Wang X, et al. Therapeutic targeting of ependymoma as informed by oncogenic enhancer profiling. Nature. 2018;553(7686):101–5.
Lomberk G, Blum Y, Nicolle R, Nair A, Gaonkar KS, Marisa L, et al. Distinct epigenetic landscapes underlie the pathobiology of pancreatic cancer subtypes. Nat Commun. 2018;9(1):1978.
van Groningen T, Koster J, Valentijn LJ, Zwijnenburg DA, Akogul N, Hasselt NE, et al. Neuroblastoma is composed of two super-enhancer-associated differentiation states. Nat Genet. 2017;49(8):1261–6.
Zhang Y, Chen F, Fonseca NA, He Y, Fujita M, Nakagawa H, et al. High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nat Commun. 2020;11(1):736.
Zhang X, Choi PS, Francis JM, Imielinski M, Watanabe H, Cherniack AD, et al. Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat Genet. 2016;48(2):176–82.
Roe JS, Hwang CI, Somerville TDD, Milazzo JP, Lee EJ, Da Silva B, et al. Enhancer Reprogramming Promotes Pancreatic Cancer Metastasis. Cell. 2017;170(5):875–88 e20.
French JD, Johnatty SE, Lu Y, Beesley J, Gao B, Kalimutho M, et al. Germline polymorphisms in an enhancer of PSIP1 are associated with progression-free survival in epithelial ovarian cancer. Oncotarget. 2016;7(6):6353–68.
Yao X, Tan J, Lim KJ, Koh J, Ooi WF, Li Z, et al. VHL Deficiency Drives Enhancer Activation of Oncogenes in Clear Cell Renal Cell Carcinoma. Cancer Discov. 2017;7(11):1284–305.
Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-Andre V, Sigova AA, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155(4):934–47.
Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153(2):307–19.
Sengupta S, George RE. Super-Enhancer-Driven Transcriptional Dependencies in Cancer. Trends Cancer. 2017;3(4):269–81.
Niederriter AR, Varshney A, Parker SC, Martin DM. Super Enhancers in Cancers, Complex Disease, and Developmental Disorders. Genes (Basel). 2015;6(4):1183–200.
Pott S, JDJNg L. What are super-enhancers? 2015;47(1):8–12. https://pubmed.ncbi.nlm.nih.gov/25547603/.
Dukler N, Gulko B, Huang Y-F, AJNg S. Is a super-enhancer greater than the sum of its parts? 2017;49(1):2–3. https://www.nature.com/articles/ng.3759.
Tak YG, Hung Y, Yao L, Grimmer MR, Do A, Bhakta MS, et al. Effects on the transcriptome upon deletion of a distal element cannot be predicted by the size of the H3K27Ac peak in human cells. 2016;44(9):4123–33. https://academic.oup.com/nar/article/44/9/4123/2462254.
Moore JE, Pratt HE, Purcaro MJ, Weng Z. A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods. Genome Biol. 2020;21(1):17.
Naylor LH. Reporter gene technology: the future looks bright. Biochem Pharmacol. 1999;58(5):749–57.
Vanhille L, Griffon A, Maqbool MA, Zacarias-Cabeza J, Dao LT, Fernandez N, et al. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat Commun. 2015;6:6905.
Liu Y, Yu S, Dhiman VK, Brunetti T, Eckart H, White KP. Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol. 2017;18(1):219.
Muerdter F, Boryn LM, Woodfin AR, Neumayr C, Rath M, Zabidi MA, et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat Methods. 2018;15(2):141–9.
Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136(5):E359–86.
Cristescu R, Lee J, Nebozhyn M, Kim KM, Ting JC, Wong SS, et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat Med. 2015;21(5):449–56.
Ooi WF, Xing M, Xu C, Yao X, Ramlee MK, Lim MC, et al. Epigenomic profiling of primary gastric adenocarcinoma reveals super-enhancer heterogeneity. Nat Commun. 2016;7:12983.
Muratani M, Deng N, Ooi WF, Lin SJ, Xing M, Xu C, et al. Nanoscale Chromatin Profiling of Gastric Adenocarcinoma. Series GSE51776, NCBI Gene Expression Omnibus. 2014. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51776
Ooi WF, Xing M, Xu C, Yao X, Ramlee MK, Lim MC, et al. Epigenomic Profiling of Primary Gastric Adenocarcinoma Reveals Super-enhancer Heterogeneity [ChIP-seq]. Series GSE76153, NCBI Gene Expression Omnibus. 2016. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76153
Ooi WF, Xing M, Xu C, Yao X, Ramlee MK, Lim MC, et al. Somatic Promoter Landscape of Primary Gastric Adenocarcinoma Delineated by Epigenomic Profiling. Series GSE75898, NCBI Gene Expression Omnibus. 2016. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75898
Ooi CH, Tan P. Gastric Cancer Project '08 (Singapore Patient Cohort). Series GSE15459, NCBI Gene Expression Omnibus. 2009. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15459
Deng N, Goh LK, Wang H, Das K, Tao J, Tan IB, et al. Genetic Landscape of Copy Number Alterations in Gastric Cancer. Series GSE31168, NCBI Gene Expression Omnibus. 2012. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31168
Cristescu R, Lee J, Nebozhyn M, Kim KM, Ting JC, Wong SS, et al. Molecular analysis of gastric cancer identifies discrete subtypes associated with distinct clinical characteristics and survival outcomes: the ACRG (Asian Cancer Research Group) study [gastric tumors]. Series GSE62254, NCBI Gene Expression Omnibus. 2015. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62254
Cristescu R, Lee J, Nebozhyn M, Kim KM, Ting JC, Wong SS, et al. Molecular analysis of gastric cancer identifies discrete subtypes associated with distinct clinical characteristics and survival outcomes: the ACRG (Asian Cancer Research Group) study [normal tissue]. Series GSE66222, NCBI Gene Expression Omnibus. 2015. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66222
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Radford EJ, Ito M, Shi H, Corish JA, Yamazawa K, Isganaitis E, et al. In utero effects. In utero undernourishment perturbs the adult sperm methylome and intergenerational metabolism. Science. 2014;345(6198):1255903.
Guo YA, Chang MM, Huang W, Ooi WF, Xing M, Tan P, et al. Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancers. Nat Commun. 2018;9(1):1520.
Xing M, Ooi WF, Tan J, Qamra A, Lee PH, Li Z, et al. Genomic and epigenomic EBF1 alterations modulate TERT expression in gastric cancer. J Clin Invest. 2020;130(6):3005–20.
Beer MA, Shigaki D, Huangfu D. Enhancer Predictions and Genome-Wide Regulatory Circuits. Annu Rev Genomics Hum Genet. 2020;21:37–54.
Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, et al. A method to predict the impact of regulatory variants from DNA sequence. Nat Genet. 2015;47(8):955–61.
Shigaki D, Adato O, Adhikari AN, Dong S, Hawkins-Hooker A, Inoue F, et al. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum Mutat. 2019;40(9):1280–91.
Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010;28(8):817–25.
Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008;453(7197):948–51.
Core LJ, Martins AL, Danko CG, Waters CT, Siepel A, Lis JT. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet. 2014;46(12):1311–20.
Zhang J, Lee D, Dhiman V, Jiang P, Xu J, McGillivray P, et al. H3K27ac ChIP-seq on human GM12878. ENCODE data portal. 2020. https://www.encodeproject.org/experiments/ENCSR000AKC/
Zhang J, Lee D, Dhiman V, Jiang P, Xu J, McGillivray P, et al. H3K27ac ChIP-seq on human K562. ENCODE data portal. 2020. https://www.encodeproject.org/experiments/ENCSR000AKP/
Wang X, He L, Goggin SM, Saadat A, Wang L, Sinnott-Armstrong N, et al. High-resolution genome-wide dissection of transcriptional regulatory activity and function in human cells. Sequence Read Archive. 2018. https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP118092
Wang X, He L, Goggin SM, Saadat A, Wang L, Sinnott-Armstrong N, et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat Commun. 2018;9(1):5380.
Lee D, Shi M, Moran J, Wall M, Zhang J, Liu J, et al. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol. 2020;21(1):298.
Xu C, Ooi WF, Qamra A, Tan J, Chua BY, Ho SWT, et al. HNF4alpha pathway mapping identifies wild-type IDH1 as a targetable metabolic node in gastric cancer. Gut. 2020;69(2):231–42.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.
Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7.
Muratani M, Deng N, Ooi WF, Lin SJ, Xing M, Xu C, et al. Nanoscale chromatin profiling of gastric adenocarcinoma reveals cancer-associated cryptic promoters and somatically acquired regulatory elements. Nat Commun. 2014;5:4361.
Xu C, Ooi WF, Qamra A, Tan J, Chua BY, Ho SWT, et al. HNF4α Pathway Mapping Identifies Wild-type IDH1 as a Targetable Metabolic Node in Gastric Cancer. Series GSE114018, NCBI Gene Expression Omnibus. 2020. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE114018
Diaz A, Nellore A, Song JS. CHANCE: comprehensive software for quality control and validation of ChIP-seq data. Genome Biol. 2012;13(10):R98.
Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW, Hawkins RD, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet. 2007;39(3):311–8.
Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29(10):569–74.
Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011;470(7333):279–83.
Loven J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell. 2013;153(2):320–34.
Lvd M, GJJomlr H. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–605.
Warren A, Chen Y, Jones A, Shibue T, Hahn WC, Boehm JS, et al. Global computational alignment of tumor and cell line transcriptional profiles. Nat Commun. 2021;12(1):22.
Grygielewicz P, Dymek B, Bujak A, Gunerka P, Stanczak A, Lamparska-Przybysz M, et al. Epithelial-mesenchymal transition confers resistance to selective FGFR inhibitors in SNU-16 gastric cancer cells. Gastric Cancer. 2016;19(1):53–62.
Cheneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic Acids Res. 2018;46(D1):D267–D75.
Bulger M, Groudine M. Functional and mechanistic diversity of distal transcription enhancers. Cell. 2011;144(3):327–39.
van Arensbergen J, van Steensel B, Bussemaker HJ. In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol. 2014;24(11):695–702.
Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A. 2010;107(50):21931–6.
Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473(7345):43–9.
Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet. 2019;51(12):1664–9.
Tippens ND, Liang J, Leung AK, Wierbowski SD, Ozer A, Booth JG, et al. Transcription imparts architecture, function and logic to enhancer units. Nat Genet. 2020;52(10):1067–75.
Jung I, Schmitt A, Diao YR, Lee AJ, Liu T, Yang D, et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat Genet. 2019;51(10):1442.
Dunn OJ, Clark V. Correlation coefficients measured on same individuals. J Am Stat Assoc. 1969;64(325):366.
Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol. 2016;12(4):e1004873.
Garkavtsev I, Grigorian IA, Ossovskaya VS, Chernov MV, Chumakov PM, Gudkov AV. The candidate tumour suppressor p33ING1 cooperates with p53 in cell growth control. Nature. 1998;391(6664):295–8.
Oki E, Maehara Y, Tokunaga E, Kakeji Y, Sugimachi K. Reduced expression of p33(ING1) and the relationship with p53 expression in human gastric cancer. Cancer Lett. 1999;147(1-2):157–62.
del Rosario RC, Poschmann J, Rouam SL, Png E, Khor CC, Hibberd ML, et al. Sensitive detection of chromatin-altering polymorphisms reveals autoimmune disease mechanisms. Nat Methods. 2015;12(5):458–64.
Consortium GT. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5.
Hu Q, Masuda T, Sato K, Tobo T, Nambara S, Kidogami S, et al. Identification of ARL4C as a Peritoneal Dissemination-Associated Gene and Its Clinical Significance in Gastric Cancer. Ann Surg Oncol. 2018;25(3):745–53.
Xiang J, Fu X, Ran W, Wang Z. Grhl2 reduces invasion and migration through inhibition of TGFbeta-induced EMT in gastric cancer. Oncogenesis. 2017;6(1):e284.
Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339(6123):1074–7.
Hay D, Hughes JR, Babbs C, Davies JOJ, Graham BJ, Hanssen L, et al. Genetic dissection of the alpha-globin super-enhancer in vivo. Nat Genet. 2016;48(8):895–903.
Shin HY, Willi M, HyunYoo K, Zeng X, Wang C, Metser G, et al. Hierarchy within the mammary STAT5-driven Wap super-enhancer. Nat Genet. 2016;48(8):904–11.
Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell. 2014;159(3):647–61.
Kampmann M. CRISPRi and CRISPRa Screens in Mammalian Cells for Precision Biology and Medicine. ACS Chem Biol. 2018;13(2):406–16.
Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, Oostra BA, et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet. 2003;12(14):1725–35.
Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489(7414):109–13.
Wang D, Rendon A, Wernisch L. Transcription factor and chromatin features predict genes associated with eQTLs. Nucleic Acids Res. 2013;41(3):1450–63.
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82.
Barakat TS, Halbritter F, Zhang M, Rendeiro AF, Perenthaler E, Bock C, et al. Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem Cells. Cell Stem Cell. 2018;23(2):276–88 e8.
Yao L, Tak YG, Berman BP, Farnham PJ. Functional annotation of colon cancer risk SNPs. Nat Commun. 2014;5:5114.
Allen M, Zou F, Chai HS, Younkin CS, Crook J, Pankratz VS, et al. Novel late-onset Alzheimer disease loci variants associate with brain gene expression. Neurology. 2012;79(3):221–8.
Soldner F, Stelzer Y, Shivalila CS, Abraham BJ, Latourelle JC, Barrasa MI, et al. Parkinson-associated risk variant in distal enhancer of alpha-synuclein modulates target gene expression. Nature. 2016;533(7601):95–9.
Zhang X, Zhou L, Fu G, Sun F, Shi J, Wei J, et al. The identification of an ESCC susceptibility SNP rs920778 that regulates the expression of lncRNA HOTAIR via a novel intronic enhancer. Carcinogenesis. 2014;35(9):2062–7.
Wang Z, Dai J, Hu N, Miao X, Abnet CC, Yang M, et al. Identification of new susceptibility loci for gastric non-cardia adenocarcinoma: pooled results from two Chinese genome-wide association studies. Gut. 2017;66(4):581–7.
Garieri M, Delaneau O, Santoni F, Fish RJ, Mull D, Carninci P, et al. The effect of genetic variation on promoter usage and enhancer activity. Nat Commun. 2017;8(1):1358.
Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M, et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 2008;4(10):e1000214.
Melgar MF, Collins FS, Sethupathy P. Discovery of active enhancers through bidirectional expression of short transcripts. Genome Biol. 2011;12(11):R113.
Sheng T, Ho SWT, Ooi WF, Xu C, Xing M, Padmanabhan N, et al. Integrative Epigenomic and High-Throughput Functional Enhancer Profiling Reveals Determinants of Enhancer Heterogeneity in Gastric Cancer. Series GSE162420, NCBI Gene Expression Omnibus. 2021. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162420
Xing M, Ooi WF, Tan J, Qamra A, Lee PH, Li Z, et al. Genomic and Epigenomic EBF1 Alterations Modulate TERT Expression in Gastric Cancer. Series GSE121140, NCBI Gene Expression Omnibus. 2020. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121140
Ooi WF, Xing M, Xu C, Yao X, Ramlee MK, Lim MC, et al. Epigenomic Profiling of Primary Gastric Adenocarcinoma Reveals Super-enhancer Heterogeneity [RNA-seq]. Series GSE85465, NCBI Gene Expression Omnibus. 2016. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85465
We thank the Sequencing and Scientific Computing teams at the Genome Institute of Singapore for sequencing services and data management capabilities, and the Duke-NUS Genome Biology Facility for sequencing services. We also thank Liu Mo, Steve Rozen, Melissa Jane Fullwood and Heike Irmgard Grabsch for helpful discussions.
This work was supported by National Medical Research Council grants NMRC/STaR/0026/2015 and MOH-OFLCG18May-0003. Funding was also provided by the Cancer Science Institute of Singapore, NUS, under the National Research Foundation Singapore and the Singapore Ministry of Education under its Research Centres of Excellence initiative, and block funding was received from Duke-NUS Medical School and the Agency for Science, Technology, and Research (A*STAR).
Ethics approval and consent to participate
Not applicable. All patient samples analyzed in this study have been previously published.
Consent for publication
Dr. Kevin P. White is a Scientific Advisor and Fellow at Tempus. The remaining authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sheng, T., Ho, S.W.T., Ooi, W.F. et al. Integrative epigenomic and high-throughput functional enhancer profiling reveals determinants of enhancer heterogeneity in gastric cancer. Genome Med 13, 158 (2021). https://doi.org/10.1186/s13073-021-00970-3