H3K4me3 ChIP-chip in colon cancer cell lines
We performed ChIP studies using microarrays containing 2.1 million oligonucleotides tiled across all human promoters to define the repertoire of genes containing H3K4me3. These studies were carried out in five colon cancer cell lines (SW480, V432, V425, V429, V441) and normal colon mucosa. A representative view of the H3K4me3 ChIP-chip data for one of the six samples tested is shown in Additional file 1. Similar to previous studies in human embryonic stem cells, primary hepatocytes and B cells [1], we found that the majority (57 to 66%) of all annotated promoters in colon were enriched for H3K4me3 at medium to high confidence (Figure 1a,b). Next we plotted the levels of H3K4me3 for all genes as a histogram. The data show a bimodal distribution with peaks at genes that have robust levels (right side of distribution) and weak or absent levels (left side of distribution) of H3K4me3 (Figure 1c). Examples of genes with H3K4me3 enrichment values at either peak of the bimodal distribution are shown in Figure 1c. Lastly, we performed a cluster analysis of the H3K4me3 promoter signals across all six samples. The results indicate that while the majority of promoters show similar H3K4me3 levels among different individual samples, a small fraction of promoter-specific differences between individuals are clearly apparent (Figure 1d).
Repressed genes can be distinguished into two classes, based on the presence or absence of H3K4me3
We next set out to identify genes that are repressed in each of the colon cancer samples compared to normal colon mucosa. For these experiments, we prepared RNA from each of the five colon cancer cell lines, as well as microdissected histologically normal colon mucosa, and five individual preparations of epithelial crypts purified by fractionation from normal colon mucosa. Samples were hybridized to Affymetrix Human Exon 1.0 ST Arrays, which are known to be more reliable for transcript quantification than standard 3'-UTR microarrays. Similarly to the H3K4me3 data, the genome-wide distribution of gene expression is largely bimodal (Additional file 2A). This allowed us to divide genes into two main categories: (1) abundantly expressed or 'on'; and (2) near background levels or 'off'. Among all samples analyzed, we found that, on average, 49% of genes were expressed and 33% of genes were off; 18% of genes fell in the trough of the bimodal distribution and could not be neatly classified as either silent or expressed.
Having identified the list of expressed and repressed genes, the next step was to merge the expression data to the ChIP-chip data. Because expression and ChIP-chip analyses were performed using different platforms, only genes represented on both platforms could be utilized for the combined analysis. Nevertheless, we identified >15,000 unique genes for which we had obtained both expression and H3K4me3 ChIP-chip data. We observed a high correlation between the expression and H3K4me3 levels in each of the cell lines (Additional file 2B). Using the combined datasets, we looked for genes that showed high expression and high levels of H3K4me3 in the normal colon samples, and were repressed in any one of the five colon cancer cell lines; 1,085 genes fit these criteria. We then examined the levels H3K4me3 among the repressed genes in each line. Surprisingly, a large portion of repressed genes (41 to 76%) contained significant levels of H3K4me3, while a much smaller fraction (15 to 34%) showed nearly absent levels of H3K4me3. We designated repressed genes that retain H3K4me3 as K4-independent and repressed genes that lose H3K4me3 as K4-dependent. The total number of K4-dependent and -independent genes varied substantially between each cell line, although in all samples more K4-independent genes were detected than K4-dependent genes (Figure 2a). Additionally, while there were some examples of genes that were repressed among all five colon cancer cell lines, by and large the identity of genes repressed differed between lines (Figure 2b). Despite this, genes designated as either K4-dependent or -independent in a given line generally showed the same designation when again repressed in any of the remaining five lines. While generally true, some exceptions to this generalization could be identified, that is, some K4-dependent genes were classified as K4-independent in a different line, and vice versa. This variability likely reflects molecular heterogeneity among the colon samples.
Verification of K4-dependent and K4-independent classes
The levels of H3K4me3 for genes in each class were validated by standard ChIP on biological replicate samples (Additional file 3A). The loss of the H3K4me3 signal at K4-dependent genes cannot be due to homozygous deletions, as these regions could be successfully amplified by genomic PCR. For further verification, we performed hybridizations of H3K4me3 ChIPs from samples V425, V432, V441, SW480 and two independent preparations of normal colon mucosa to NimbleGen 385K promoter arrays and repeated the data analysis. Consistent with the previous results, both K4-dependent and -independent genes were evident, and the relative proportions of each class were similar within and between each cell line to the proportions found using the 2.1M feature arrays (Additional file 4A). Next, we examined the exon-tiling array data to confirm that genes designated as repressed were in fact repressed across all exons, and that when expressed in the crypt were expressed across all exons, consistent with the canonical transcript from the locus. We found that, compared to genes in crypt that were designated as 'on', and that conformed to the canonical transcript and its associated promoter, the expression levels across all exons of genes designated as repressed were at or near background levels. The exon usage across a representative example of a repressed gene is shown in Additional file 4B. Lastly, we investigated whether K4-dependent and K4-independent genes repressed in cell culture were similarly under-expressed in primary tumors. Using global expression data, we first selected K4-dependent and K4-independent genes that were repressed by at least two-fold in all five colon cancer cell lines relative to five epithelial colon crypt samples. We then determined the percentage of these genes that were also repressed in 120 primary tumors relative to 16 normal mucosa samples. Of all genes, only 7.7% of genes repressed in the cell lines are also repressed in tumors relative to mucosa, whereas 76% of K4-dependent and K4-independent genes repressed in cancer cell lines validated as repressed in primary tumors (P < 2.2 × 10-16 by exact binomial test). Collectively, these data strongly support the existence of the two classes of repressed genes in colon cancer, indicate that alternative promoter usage is unlikely to account for the difference in H3K4me3 status between the two classes, and suggest that most genes identified as repressed in the cell culture models are genuinely repressed in colon cancer.
K4-dependent and -independent genes show differences in chromatin structure
We mapped open regions of chromatin in three of the five colon cancer lines (SW480 VACO432, and VACO429) using the technique of DNase-seq [10]. Each promoter was then assigned a score corresponding to the relative sensitivity to DNaseI digestion, and the data were merged to the H3K4me3 ChIP-chip data and expression data. We then tested whether promoters of K4-containing expressed genes, K4-dependent repressed genes, and K4-independent repressed genes differ in their sensitivity to DNaseI digestion. As expected, K4-containing expressed genes were significantly more sensitive to DNaseI digestion than both classes of repressed genes in all three cell lines (P < 5.7e-8; Figure 3). Of the two classes of repressed genes, K4-independent genes were significantly more sensitive to DNaseI digestion than K4-dependent genes (P < 0.0002). Specifically, 75 to 90% of genes designated as K4-independent were located within open chromatin, compared to 22 to 53% of K4-dependent promoters. Thus, compared to K4-independent genes, the promoters of repressed genes lacking H3K4me3 are generally located within relatively inaccessible conformations of chromatin.
K4-dependent genes show DNA hypermethylation
We next looked for the presence of CpG islands within 2 kb of the TSSs of K4-dependent and -independent genes; 95 to 99% of K4-independent genes were found to have a CpG island, compared to 33 to 56% of K4-dependent genes (Additional file 5). We noticed that several genes designated as K4-dependent in our study, including CDX1, BMP3, and MLH1, were previously reported to show CpG island promoter hypermethylation [13–15]. These findings prompted us to test whether K4-dependent genes lacking CpG islands also showed promoter hypermethylation. We performed pyrosequencing of bisulfite converted DNA to quantify DNA methylation at K4-dependent and K4-independent genes in cell lines in which these genes were repressed. As controls, these genes were also analyzed in three independent preparations of purified normal colon crypts. The pyrosequencing assays were designed to interrogate CpG sites located under the H3K4me3 peak of each gene, in close proximity (<700 bp) to the TSS. Nine of eleven K4-dependent repressed genes showed dramatic increases in DNA methylation over levels detected in normal colon crypt (Figure 4a-c). Moreover, several genes, including PIGR, CD177, and HMGCS2, were hypermethylated in more than one cell line in which these genes were designated as K4-dependent. In summary, 15 out of 19 assays performed on K4-dependent genes were positive for DNA hypermethylation. In comparison, two out of eight pyrosequencing assays performed on K4-independent genes were positive for DNA hypermethylation. These proportions are significantly different (P < 0.03 by Z-test for proportions).
We next tested whether K4-dependent genes that lack CpG islands could be reactivated upon treatment with 5-azacytidine. Three out of four K4-dependent genes tested showed a significant increase in transcript levels upon treatment with 5-azacytidine (P ≤ 0.01; Figure 4d), consistent with hypermethylation of the scattered CpG sites under the H3K4me3 mark being functionally involved in these genes' silencing. The data indicate that DNA hypermethylated K4-dependent repressed genes do not necessarily contain CpG islands, and that repressed K4-dependent genes lacking both CpG islands and H3K4me3 are very likely to be DNA methylated in regions that lose the H3K4me3 mark. The results are also consistent with previously reported reactivation of hypermethylated genes lacking CpG islands upon treatment with 5-azacytidine [16].
Characterization of histone marks at K4-independent and K4-dependent genes
We performed ChIP-chip analysis to test whether repressed genes acquire histone modifications generally associated with transcriptional repression (H3K9me2, H3K27me3, and H4K20me3) and, if so, whether these modifications differ between K4-dependent and -independent genes. These studies were performed using custom-designed tiled arrays spanning the promoters and bodies of 80 genes repressed in colon cancer line SW480. ChIP-chip signal intensities for each mark and for each gene promoter were then hierarchically clustered and plotted in a heatmap. Strikingly, this analysis revealed a near perfect division of the K4-independent and K4-dependent gene classes (Figure 5). Moreover, consistent with data above indicating that K4-dependent genes often lack CpG islands, a minority (27%) of genes classified as K4-dependent were found to have a CpG island at the TSS, compared to nearly all (>95%) of the genes classified as K4-independent. In addition, we found that genes associated with repressive histone marks, including H3K27me3, H4K20me3, and occasionally H3K9me2, were mostly K4-independent genes in which the H3K4me3 mark had been retained, or were originally marked with both H3K4me3 and H3K27me3 in colon mucosa, so-called 'bivalent' chromatin [17]. The presence of H3K27me3 was validated at several K4-independent genes in SW480, as well as V441 and V429 by standard ChIP (Additional file 3B). Interestingly, with the exception of a few genes in our K4-dependent group that showed low-level acquisition of H3K27me3 or H3K9me2, the loss of H3K4me3 was usually not accompanied by any additional chromatin modifications (other than our finding of increased DNA methylation). While we currently cannot rule out the possibility that these K4-dependent genes acquired a repressive mark that was not tested, the data indicate that the repertoire of histone marks at K4-dependent and -independent genes are often distinct.
K4-dependent and -independent genes are functionally distinct
We used Panther to determine whether specific pathways or biological processes are enriched among K4-dependent and -independent genes, and if so, whether they differ between the two classes [18]. Intriguingly, several pathways previously linked to colorectal carcinogenesis were enriched in the K4-independent set, including transforming growth factor-beta, insulin and Wnt signaling (Additional file 6). In contrast, K4-dependent genes were enriched in axon guidance, pyrimidine metabolism and cadherin signaling. Both classes were enriched for genes involved in apoptosis and platelet-derived growth factor signaling, as well as pathways associated with B- and T-cell activation.
We next tested whether genes in each class show tissue-specific expression in colon crypts, and if so, whether the degree of crypt-specific expression differed between the two classes. To do this, we compared global gene expression levels between normal colon crypt, HepG2, K562, and NB4 cells and computed a tissue-specificity score for each gene using the method of Shannon entropy [19]. We then plotted the distribution of colon-specificity scores for K4-dependent and -independent repressed genes, all genes, and 1,000 randomly selected genes. Both K4-dependent and -independent genes showed a high degree of crypt tissue-specific expression, with K4-dependent genes being significantly more crypt-specific than K4-independent genes (P < 0.0001) (Additional file 7A). These findings are consistent with other studies showing that genes lacking CpG islands, which comprise a large fraction of the K4-dependent class of repressed genes, are generally associated with tissue-specific expression [19].
Lastly, we tested whether K4-dependent genes show an increased propensity for silencing in colon cancer compared to K4-independent genes, or vice versa. To do this, the transcriptional status of genes designated as K4-dependent or K4-independent in the five colon cell lines was surveyed in an additional 35 colon cancer cell lines. We then plotted the distribution of the median expression values for each set as a histogram (Additional file 7B). Although the expression of K4-dependent genes is more variable than K4-independent genes, the overall distribution of K4-dependent genes is significantly shifted to the left of that of K4-independent genes (P = 0.015), indicating that K4-dependent genes are repressed more often in colon cancer than K4-independent genes. Although confirmatory studies are required, these findings raise the possibility that genes targeted for silencing in colon cancer are more often inactivated by mechanisms involving removal of H3K4me3 than by K4-independent mechanisms.