ChIP'ing the mammalian genome: technical advances and insights into functional elements
© BioMed Central Ltd 2009
Published: 23 September 2009
Skip to main content
© BioMed Central Ltd 2009
Published: 23 September 2009
Characterization of the functional components in mammalian genomes depends on our ability to completely elucidate the genetic and epigenetic regulatory networks of chromatin states and nuclear architecture. Such endeavors demand the availability of robust and effective approaches to characterizing protein-DNA associations in their native chromatin environments. Consider able progress has been made through the applica tion of chromatin immunoprecipitation (ChIP) to study chromatin biology in cells. Coupled with genome-wide analyses, ChIP-based assays enable us to take a global, unbiased and comprehensive view of transcriptional control, epigenetic regulation and chromatin structures, with high precision and versatility. The integrated knowledge derived from these studies is used to decipher gene regulatory networks and define genome organization. In this review, we discuss this powerful approach and its current advances. We also explore the possible future developments of ChIP-based approaches to interrogating long-range chromatin interactions and their impact on the mechanisms regulating gene expression.
Now that the complete human genome sequence is available , the current challenges are to identify all the functional genetic elements it encodes and to elucidate the complex regulatory networks that coordinate the function of all genetic and epigenetic elements that are crucial for cellular homeostasis, development and disease progression [2, 3]. Hence, research focus has turned to the annotation of the genome for functional properties and the characterization of regulatory elements involved in controlling gene expression, gene function and genome stability.
Significant efforts have been dedicated to deciphering global chromatin structures, modifications and chromatin-protein interactions. Due to the dynamic and transient nature of such interactions, early attempts using biochemical fractionation were problematic . Thanks to a powerful approach called chromatin immunoprecipitation (ChIP) , our understanding of protein-DNA interactions within their native chromatin context, in relation to different nuclear activities, has been greatly advanced. ChIP captures snapshots of these interactions in living cells by employing efficient cross-linking agents. The chromatin is disrupted by sonication and the DNA fragments cross-linked to the proteins of interest are then selectively enriched by immunoprecipitation with specific antibodies. After reversal of the cross-links, the enriched DNA can be subjected to further characterization. The ChIP method has been applied successfully in different areas, with focus on the analysis of chromatin structures and transcriptional dynamics. These areas include transcription factor (TF) binding , structural components of chromatin complexes [7, 8], histone modifications [9–11] and enzyme function in histone modifications [12, 13] across a wide range of organisms. Here, we summarize the developments of ChIP-based assays, their technical specifications and how they are applied to reveal insights into the molecular mechanisms during transcriptional and epigenetic regulation.
Other important factors to be considered while doing ChIP include the antibody specificity and the fine balance between the cross-linking stringency and sonication conditions. The robustness of ChIP to differentially select target regions versus random genomic DNA is highly dependent on the availability of high-quality and high-affinity antibodies against the protein of interest. Community and industrial efforts have been initiated to characterize and catalog ChIP-grade antibodies against nuclear proteins of interest. Furthermore, ChIP with an antibody of different isotype is commonly used to validate the binding events found. To further improve the efficiency of the ChIP process, a sequential ChIP can be attempted. In this method, two rounds of ChIP are performed sequentially using different subtypes of antibodies against the same proteins but different epitopes. Although highly accurate, sequential ChIP is technically challenging and suffers from low yield, which limits its applications.
It is important to note that the ChIP process differentially enriches the targeted protein-DNA interactions from the entire nuclear cross-linked chromatin-protein complexes through antibody selection; however, it is not a purification step. Therefore, once the ChIP material is available, additional steps are required to characterize the material pulled down and determine their relative enrichments (Figure 2b-f). In a conventional ChIP assay, the enriched regions are initially analyzed using small-scale assays such as traditional cloning followed by a sequencing-based approach , Southern blot hybridization analysis  or quantitative real-time polymerase chain reaction (PCR) (ChIP-qPCR) . The availability of the complete genome sequences of many complex organisms offers the opportunity to carry out genome-wide detection of protein-chromatin interactions. Two major approaches have been commonly adopted as readouts to determine the identity of these ChIP-enriched DNA fragments at the whole-genome scale: hybridization-based or sequencing-based methods.
To characterize the protein-DNA interaction profiles across different regions on the genome landscape, high-density microarrays are created and hybridization is used for the analysis of ChIP DNA (referred to as ChIP-on-chip). In brief, after reversal of the cross-links, ChIP-enriched DNA and control DNA will be amplified by PCR and fluorescently labeled with the cyanine dyes Cy5 and Cy3 for hybridization to the DNA microarrays containing probes that correspond to the genomic sequences of interest (Figure 2b). The ratio of the Cy5 to Cy3 fluorescence intensities measured for each DNA element provides a measure of the extent of the binding across the entire genomic regions covered in the array. Genomic loci with higher fluorescent intensity in the ChIP DNA than the control DNA will be considered enriched as the potential binding sites. Using this technique, the non-repeat sequences in the genome can be interrogated and many novel binding sites uncovered. For example, genes regulated by many TFs such as STE12 and GAL4p were characterized in detail in yeast systems and revealed new functional pathways regulated through multiple TF bindings [6, 22].
Initially, array studies were limited to promoter regions amplified through PCR . Over the years, significant improvements have been made to the ChIP-on-chip procedures as well as the array designs. High-density oligonucleotide tiling arrays that represent the entire genome are now available and enable comprehensive mapping of protein-DNA interactions [6, 22, 24].
Despite considerable success, array-based readout of ChIP signals does suffer from several limitations. Firstly, the hybridization-based platform is unable to detect signals in repeat regions. Due to the large size and complexity of mammalian genomes, the DNA microarrays available often only contain partial genomic content or promoter regions of well-characterized genes. Therefore, many of the ChIP-chip analyses provide incomplete information, as any biologically significant binding occurring within the non-interrogated regions cannot be captured. Nevertheless, the repetitive regions are important areas to examine, based on what we know about TF binding . Secondly, PCR is generally used to amplify the ChIP material for hybridization, which can result in potential hybridization noise signals from biased amplification. To overcome non-specific amplification from direct PCR and cross-hybridization noise, an improved method called ChIP-DSL (DNA selection and ligation) was developed . In ChIP-DSL, paired oligonucleotides corresponding to regions of interest are designed as signatures and selected by ChIP DNA. The annealed paired oligos are then ligated and PCR used for the array-based detection (Figure 2d). ChIP-DSL avoids direct amplification of ChIP fragments and the amplicons are uniform in size to minimize PCR bias. Thirdly, as many different array designs and genome assemblies exist, the results from different groups could be difficult to compare. Lastly, the global ChIP-chip approach is dependent on the construction of whole-genome arrays. For certain complex genomes, these are not commercially available or economically practical. Due to these limitations, the whole-genome tiling array approach has not yet been adopted by the entire research community and has only been used in several large projects studying the genomes of human and mouse.
Sequencing-based methods emerged as an alternative to genome-wide readouts of ChIP analysis, particularly for complex genomes. To determine the identities of ChIP DNA by sequencing methods, large numbers of sequence reads are required. As ChIP assay is only a process of enrichment, a significant amount of non-enriched background DNA will still be present in the ChIP DNA material. With a limited survey of the ChIP DNA pool, it is difficult to distinguish between genuine signal and noise. However, if the sampling of the DNA pool can be increased, the genuine ChIP-enriched sites can be defined by multiple overlapping ChIP fragments, whereas the non-specific regions will only be covered by random ChIP singletons. The bona fide sites can then be inferred by multiple mapped sequenced fragments.
To overcome the depth of sequencing coverage, short-tag-based sequencing strategies like serial analysis of gene expression (SAGE) have been adopted. SAGE was originally developed for counting transcript levels  and later applied to genome scanning for transcription factor binding site and histone modifications [28, 29]. In ChIP-SAGE, the ChIP-enriched DNA fragments are end-ligated with a universal biotinylated linker, and 21-bp tags are generated by type II restriction enzyme digestion for sequencing (Figure 2e). Compared with the ChIP-on-chip hybridization approach, ChIP-SAGE increases the coverage and resolution to the entire genome . However, this monotag approach suffers from mapping ambiguity and is unable to differentiate amplification bias, and thus has a lower accuracy.
In order to enhance the mapping accuracy of short-tags and increase the information content while still exploiting the short-tag sequencing efficiency, a paired-end-ditag (PET) method has been developed (ChIP-PET). Like SAGE, the PET approach was initially used for transcriptome analysis . In ChIP-PET, the ChIP DNA is converted into PETs for ultra-high-throughput sequencing. Each PET sequence is mapped onto the genome and the locations of binding sites can be inferred by overlapping PET-defined clusters (Figure 2c). Over 90% of the sites identified can be validated by ChIP-qPCR, and de novo consensus binding motifs can be predicted from the overlapping regions . The ChIP-PET approach has been demonstrated to map whole-genome TF binding sites and epigenetic modifications in both cancer and embryonic stem cells (ESCs) with high specificity and resolution [9, 31, 32]. Compared to ChIP-on-chip, the ChIP-PET approach is an unbiased and open system for identifying all DNA segments enriched by ChIP. This method is not restricted by the array coverage or probe performance and thus allows a real genome-wide analysis. Its only limitation is the upfront requirement for large sequencing capacity.
Recently, the development of robust and advanced sequencing technologies, particularly the ability to rapidly decode millions of DNA fragments simultaneously with high efficiency and relative low cost, has facilitated our ability to characterize ChIP DNA by direct sequencing (ChIP-Seq) [11, 33]. ChIP-Seq has proved to be a simple and robust method for global, unbiased interrogation of the TF binding sites and epigenetic modifications. In ChIP-Seq, the ChIP DNA is end polished and ligated with the sequencing adaptors, followed by limited PCR amplifications. Size selections of DNA fragments are subjected to cluster amplification and sequencing (Figure 2f). Between 25 and 36 nucleotides from either end of ChIP DNA fragments can be determined with high accuracy, and millions of high-quality reads can be generated within days. Based on their mapping locations, regions with a high number of clusters of ChIP tag sequences are defined as ChIP enrichment sites. To further distinguish the true binding sites from the non-specific sites, control DNA (input) is sequenced to determine the noise, which can then be removed. ChIP-Seq enables the performance of deep sequencing at high resolution and low cost.
With the availability of whole-genome and unbiased approaches to characterizing chromatin-DNA interactions, our knowledge of the genomic features, landscape, target genes and gene expression activity has drastically advanced in recent years. Here, we summarize what we have learnt collectively on the critical links between chromatin modifications and transcriptional outputs.
Applying ChIP-based assays for components in the transcription machinery or TFs, their genomic targets and regulatory circuitries can be reconstructed [33–35]. One of the unique and intriguing findings from these genome-wide studies indicates that there are large numbers of identified target binding sites located outside of the previously annotated promoters and suggests that the functional regulatory elements of the genome are larger than previously envisioned. For example, over 30% of the estrogen receptor binding sites were found in the inter-genic regions at least 50 kb away from the neighbor genes . Such an observation raises interesting questions about the functional nature of these binding sites and about how to accurately correlate the genes and their corresponding regulatory regions. The genome-wide ChIP assay can also be used to uncover the sequences bound by specific TFs and characterize their binding site selection. Through the putative in vivo binding sites identified, the ab initio binding consensus sequences associated with the protein of interest can be efficiently derived . We have also gained insights into how TFs have evolved different mechanisms to elicit target gene responses. Some individual TFs can elicit multiple transcriptional responses, while different TFs can be recruited to the same target regions to trigger transcriptional activation leading to cell differentiation . In ESCs, key reprogramming factors and TFs involved in signaling pathways as well as self-renewal have been analyzed. Specifically, two clusters of genomic loci were found that were extensively targeted by multiple transcription factors in the ESC genome. The first cluster includes NANOG, OCT4, SOX2, SMAD1 and STAT3. The second cluster consists of c-Myc (MYC), n-Myc (MYCN), ZFX and E2F1. STAT3 and SMAD1 are major signaling components modulating the leukemia inhibitory factor (LIF) and bone morphogenetic protein (BMP) pathways. LIF and BMPs are protein factors required for the maintenance of the pluripotency state of ESCs. These results have shown that LIF and BMP signaling pathways are integrated into the ESC pluripotency maintenance TF cluster (OCT4, SOX2 and NANOG) through SMAD1 and STAT3; and multiple transcription factor clustering is the mechanism to recruit cell-specific enhancer targeting for lineage-specific transcription regulation.
In addition to TF binding, the ChIP assay can also be used to profile the distribution of the chromatin modification components, histone variants and modifications . One of the pioneering efforts was to understand the mechanisms by which histone modifications regulate transcription and chromatin organization. Starting in the yeast system, the application of ChIP assays demonstrated that histone acetylation was a critical link between chromatin structure and transcriptional activation . In mammalian genomes, Barski et al. have characterized the histone codes through profiling 20 lysine and arginine methylation modification patterns in histones, and identified the signatures for histone methylation patterns surrounding promoters, enhancers, insulators and transcribed regions . Among them, monomethylations of H3K27, H3K9, H4K20, H3K79 and H2BK5 were found to be associated with gene activation, while trimethylation of H3K27, H3K9 and H3K79 was linked to gene repression. In a study to investigate the types of histone modifications that underlie the chromatin properties to maintain the pluripotent nature of the ESC genome, Lander and colleagues un covered 109 domains showing overlapping opposing histone modification marks, termed 'bivalent domains', where large regions of H3K27me3 harbor smaller regions of H3K4me3 . Following further characterization using a genome-wide ChIP-PET approach in human ESCs , H3K4me3 was found to be prevalent and occurred in nearly 70% of promoters in annotated genes, while H3K27me3 appears less occupied in promoter regions and forms a 'bivalent domain' by co-marking 10% of genes with H3K4me3. A large portion of genes that are important for mesoderm development, neuroectoderm and other developmental processes are among the genes co-modified by H3K4me3 and H3K27me3 .
Through the applications of genome-wide ChIP analyses across different organisms, we learnt that TF binding sites are not necessarily conserved among species [34, 38] and that not all TF-chromatin interactions are functional . Using the binding regions of seven mammalian TFs (ESR1, TP53, MYC, RELA, POU5F1, SOX2 and CTCF) identified on a genome-wide scale, we found only a minority of sites appeared to be conserved at the sequence level, suggesting that evolution has adapted factor binding sites to aid the dynamic regulation of mammalian genomes.
The recent expansion of ChIP technologies has enabled a better understanding of the interactions between TFs and the regulatory networks contributing to gene regulation. Surprisingly, these analyses have demonstrated that many TFs rarely bind to promoter regions compared with intergenic regions , suggesting critical roles for long-distance, promoter-enhancer interactions in regulating gene expression in mammalian cells . In some cases, it was found that the transcriptional activation involved distal control elements located hundreds of kilobases away, which are brought together through connecting DNA loops that allow physical interactions between the regulatory elements for gene expression . However, methods like ChIP-Seq can only reveal the functional genome in a linear fashion. Information on long-range interactions harnessed within the chromatin-protein complexes and how they impact transcriptional regulation is still lacking.
Initial efforts to characterize the distant interactions have been technically challenging and mostly limited to microscopy techniques, which are laborious and of poor resolution. Through formaldehyde cross-linking followed by proximity-based ligation, long-range chromosomal interactions can be captured and detected by PCR (chromatin conformation capture, 3C), microarray analysis or high-throughput sequencing (4C or 5C), with limited scale and selective bias [46–48]. Applying 3C in the human β-globin loci, various specific interactions between the genes and the regulatory elements were demonstrated . Although 3C and its variants are excellent tools to study complex interactions, these methods require prior knowledge of interacting candidates, hence cannot be used for genome-wide profiling for all chromatin interactions. As such, there is a need for approaches that reveal global chromatin interactions at the whole-genome scale in an unbiased and de novo manner. With the pair end ditag concept, we further explored the ability of PET to connect two ends of DNA and delineate their relationships to characterize interacting chromatins (chromatin interaction analysis by pair-end-ditagging; ChIA-PET) . In this approach, ChIP was performed with antibodies specific to the TF of interest. Specially designed short oligonucleotide linkers were ligated to the ends of each interacting DNA fragment, followed by second intra-molecular ligations to connect two interacting DNA fragments together. PETs from ligated DNA are extracted and analyzed by pair end ditag sequencing. The linear binding sites along genomic DNA can be revealed from self-ligation PETs and the interactions between the binding sites can be determined from inter/intra-chromatin ligating PETs (Figure 3a). Therefore, a single ChIA-PET experiment can generate two interrelated datasets, depending on the step at which the ligation occurs (before or after the de-cross-link). Such a feature, when supported by ultra-high-throughput sequencing, can reveal interactomes mediated by TFs or chromatin-modifying complexes. We expect that the mapping of the whole-genome interactome mediated by pertinent TFs or chromatin modifications will translate into knowledge that is critical for understanding the fundamental transcriptional regulation programs.
As described in this paper, combination of the ChIP assay with robust readout methods is extremely powerful for a variety of whole-genome analyses in order to define the functional components within mammalian genomes. The wide range of interactions and diverse organisms it has been applied to have already demonstrated the power of this approach. Considerable progress has been made in our understanding of transcriptional and epigenetic regulation, as well as in the elucidation of transcriptional regulatory networks and chromatin organization. Ultimately, with further improvement of the ChIP-based assays, particularly in the robustness of the enrichment and expansion of their applications, we foresee that ChIP will continue to be the critical approach to study chromatin biology and genome regulation. If successfully implemented, particularly for individual and personal human genome interrogations, such applications will further our understanding of how genetic and epigenetic regulation coordinates eukaryotic development. This knowledge has the potential to translate into a better understanding of the fundamental transcriptional regulation programs, and lead to biomarker discovery or therapeutic target stratifications, which ultimately guide the development of strategies for personalized medicine.
bone morphogenetic protein
chromatin conformation capture
chromatin interaction analysis followed by pair-end-ditagging
chromatin immunoprecipitation followed by DNA microarray hybridization
chromatin immunoprecipitation followed by DNA selection and ligation
chromatin immunoprecipitation followed by pair-end-ditagging sequencing
chromatin immunoprecipitation followed by quantitative PCR
chromatin immunoprecipitation coupled with serial analysis of gene expression
chromatin immunoprecipitation followed by high-throughput sequencing
embryonic stem cell
leukemia inhibitory factor
RNA immunoprecipitation followed by PCR
The authors would like to acknowledge Genome Technology and Biology Group at the Genome Institute of Singapore for technical details on sequencing, particularly the GIS sequencing team. The authors are supported by the Agency for Science, Technology and Research (A*STAR) of Singapore and the NIH ENCODE grant 1R01HG003521-01.