Copy number variations and cancer

DNA copy number variations (CNVs) are an important component of genetic variation, affecting a greater fraction of the genome than single nucleotide polymorphisms (SNPs). The advent of high-resolution SNP arrays has made it possible to identify CNVs. Characterization of widespread constitutional (germline) CNVs has provided insight into their role in susceptibility to a wide spectrum of diseases, and somatic CNVs can be used to identify regions of the genome involved in disease phenotypes. The role of CNVs as risk factors for cancer is currently underappreciated. However, the genomic instability and structural dynamism that characterize cancer cells would seem to make this form of genetic variation particularly intriguing to study in cancer. Here, we provide a detailed overview of the current understanding of the CNVs that arise in the human genome and explore the emerging literature that reveals associations of both constitutional and somatic CNVs with a wide variety of human cancers.

A Ab bs st tr ra ac ct t DNA copy number variations (CNVs) are an important component of genetic variation, affecting a greater fraction of the genome than single nucleotide polymorphisms (SNPs). The advent of high-resolution SNP arrays has made it possible to identify CNVs. Characterization of widespread constitutional (germline) CNVs has provided insight into their role in susceptibility to a wide spectrum of diseases, and somatic CNVs can be used to identify regions of the genome involved in disease phenotypes. The role of CNVs as risk factors for cancer is currently underappreciated. However, the genomic instability and structural dynamism that characterize cancer cells would seem to make this form of genetic variation particularly intriguing to study in cancer. Here, we provide a detailed overview of the current understanding of the CNVs that arise in the human genome and explore the emerging literature that reveals associations of both constitutional and somatic CNVs with a wide variety of human cancers. The electronic version of this article is the complete one and can be found online at http://genomemedicine.com/content/1/6/62 © 2009 BioMed Central Ltd C Co op py y n nu um mb be er r v va ar ri ia at ti io on ns s: : d dy yn na am mi ic c g ge en no om me es s Our genomes are not the stable places we once thought they were. Recent genome-wide studies have shed light on copy number variations (CNVs), an unexpectedly frequent, dynamic and complex form of genetic diversity, and have quickly overturned the idea of a single diploid human 'reference genome'. Although the characterization of the extent and location of these regions in healthy genomes is far from complete, many groups, including ours, are actively trying to determine the clinical impact of CNVs in patient populations.
CNVs are structurally variant regions in which copy number differences have been observed between two or more genomes [1]. Defined as being larger than 1 kilobase (kb) in size, CNVs can involve gains or losses of genomic DNA that are either microscopic or submicroscopic and are, therefore, not necessarily visible by standard G-banding karyotyping. Until recently, only a few copy-number-variable loci had been identified, such as duplications at the α7-nicotinic receptor gene (CHRNA7) at 15q13-15 [2] and variation at the major histocompatibility complex locus [3]. In 2004, significant advances in DNA array technology enabled the discovery of many CNVs, revealing a novel and pervasive form of inter-individual genomic variation [4,5]. These pioneering genome-scale efforts used two different platforms to find 76 CNVs in 20 individuals [5] and 255 CNVs in 55 individuals [4], some of which were common to both studies, suggesting possible hotspot regions of CNVs in the human genome. Even this was soon found to be an under-representation of the number of CNVs; follow-up studies have since ascertained many thousands of CNV regions in hundreds of healthy individuals. In fact, the recent increase in scientific interest in CNVs, combined with improvements in microarray fabrication (higher density at lower cost) and the development of new informatics techniques, have led to the ascertainment of approximately 21,000 CNVs, or around 6,500 unique CNV loci, in the five short years since this form of genetic variation was first revealed (these figures come from the March 2009 update of the Database of Genomic Variants (DGV) [4]). CNVs are now thought to cover at least 10% of the human genome. Furthermore, next-generation sequencing technologies will soon be used to sequence thousands of genomes along with their CNVs.
C CN NV Vs s a an nd d d di is se ea as se e: : m mu ut ta ab bl le e g ge en no om me es s The CNV map for the human genome is being continuously refined and has already pinpointed the location, copy number, gene content, frequency and approximate breakpoints of numerous CNVs in the healthy population. These structural variants can alter transcription of genes by altering dosage or by disrupting proximal or distant regulatory regions, as has been shown globally in the healthy human [6], mouse [7] and rat genomes [8]. It is, however, the specific diseaseassociated CNV loci that have been particularly scrutinized and that therefore provide the most detailed examples of how CNVs can alter cellular function. We will highlight three insights in particular from the literature: that pathogenic CNVs often contain multiple genes, that the effect of a pathogenic CNV is not limited to the gene(s) it contains, and that pathogenic CNVs can have reciprocal deletions/duplications.
T Th he e n nu um mb be er r o of f g ge en ne es s i in n p pa at th ho og ge en ni ic c C CN NV Vs s Genomic rearrangements give rise to a variety of diseases classified as 'genomic disorders' [9]. Because they involve large regions, it is common for genomic disorders to include many deleted or duplicated genes, unlike traditional mutations that affect a single coding-region change of one gene. These genes can be either fully encompassed or partially overlapped by the pathogenic CNV. Deletions of 22q11.2 are associated with DiGeorge/velocardiofacial syndrome and include the catechol-O-methyltransferase gene, the T box transcription factor 1 gene and others [10]. Similarly, the autosomal dominant Prader-Willi syndrome (15q11-q13 deletion) involves many genes [11], and the Williams-Beuren syndrome (7q11.23 deletion) involves 28 genes [12]. As microarray resolution increases, genomic disorders will certainly be found that are caused by small CNVs involving only a single gene, or even a portion of one gene.
T Th he e s so ou ur rc ce e o of f t th he e e ef ff fe ec ct t o of f a a p pa at th ho og ge en ni ic c C CN NV V Usually, the genes contained in the pathogenic CNV are candidates for association with the clinical phenotype under study. However, research on genomic disorders has shown that some genes within a CNV may not be necessary, or may not be sufficient, to cause the observed disease. For example, a recurrent 3.7 Mb microdeletion is responsible for 70% of cases of Smith-Magenis syndrome (SMS) [13], a neurobehavioral disorder involving sleep disturbance, craniofacial and skeletal anomalies, intellectual disability and distinctive behavioral traits. Although the size of the deletions observed varies, the identification of a common 'critical region' (1.5 Mb) in SMS patients led to the conclusion that the retinoic acid induced 1 (RAI1) gene alone is responsible for most SMS features. Indeed, RAI1 point mutations have been seen in patients without deletions with similar phenotypes, thus confirming that this gene (of the 13 in the critical region) is necessary to cause SMS. Patients with additional genes deleted have a variable and more severe phenotype. In contrast, in Williams-Beuren syndrome, not only the aneuploid genes but also genes far outside the deleted region have reduced expression and are thought to contribute to the phenotype [14]. Such long-range influence of CNVs on distant gene expression is proposed to be caused by positional effects [15]. Recombination between highly homologous sequences (nonallelic homologous recombination) can generate deletions, duplications, inversions and translocations. The sequence architecture that allows one copy number change can also allow its reciprocal at the same locus. The reciprocal events usually cause different phenotypes and occur at different frequencies in the population and at different rates during meiosis [16].
C CN NV Vs s a an nd d c ca an nc ce er r p pr re ed di is sp po os si it ti io on n: : f fi ir rs st t h hi it ts s t to o t th he e t tu um mo or r g ge en no om me e The goal of cancer genetics is to discover all variant alleles that predispose to neoplasms. To this end, single nucleotide polymorphisms (SNPs) have been the most widely studied form of genetic variation and, by using massive wholegenome studies (genome-wide association (GWA) studies), many common SNPs have been shown to be associated with cancer and other complex traits. However, the results of these efforts have not explained much of the heritability of disease [17]. This is perhaps because GWA studies have mostly ignored the inter-individual genetic variation provided by CNVs, which affect more than 10% of the human genome. CNVs, especially smaller variants, have been essentially hidden from view until recently; thus, only a handful of studies have found an association of CNVs with cancer. Once these CNVs have been identified, one can only assume that CNVs will explain a larger portion of the genetic basis of cancer. Once identified, common and rare CNVs should be considered separately, as they may have very different roles in cancer.
C Co om mm mo on n c ca an nc ce er r C CN NV Vs s As with SNPs, CNVs that are found frequently in the healthy population (common CNVs) are very likely to have a role in cancer etiology. In the only study published so far that begins to test the hypothesis that common CNVs are associated with malignancy, we [18] created a map of every known CNV whose locus coincides with that of bona fide cancer-related genes (as catalogued by [19]); we called these cancer CNVs. In an initial analysis [18], we examined 770 healthy genomes using the Affymetrix 500K array set, which has an average inter-probe distance of 5.8 kb. As CNVs are generally thought to be depleted in gene regions [20], it was surprising to find 49 cancer genes that were directly encompassed or overlapped by a CNV in more than one person in a large reference population (Figure 1). In the top ten genes, cancer CNVs could be found in four or more people. In this analysis only CNVs directly overlapping a cancer gene were selected (either both breakpoints were inside the genomic interval containing the gene, both were outside the interval, or one breakpoint was inside while the other was outside). However, this is probably an underestimate of the actual number of common cancer CNVs, for two reasons. First, many smaller variants are missed at the resolution of this array: the mean size of CNVs found using the Affymetrix 500K array is 206 kb [20], whereas the CNVs found using the newer Affymetrix 6.0 platform with a median inter-marker distance of less than 700 bp are 5-15 times smaller [21]. Second, as discussed above, there are unquestionably additional, more distal CNVs that have a long-range effect on cancer gene transcription levels.
Validating the initial observation [18], many of these genes are also found in the DGV, a curated list of CNVs compiled from numerous publications [4]. Analysis of the DGV [22] shows that nearly 40% of cancer-related genes are interrupted by a CNV. This trend continues: even among the ten most recent CNV publications in the DGV (those published after February 2008), many important tumor suppressor genes and oncogenes can be found with diverse functions, including apoptosis, control of cell cycle checkpoints and DNA repair, and numerous translocation and fusion gene partners. An example of this is Rad51L1, a gene that is a member of the RAD51 family; this is essential for DNA repair by homologous recombination and has been shown by a GWA study to contain a SNP that is strongly associated with breast cancer [23].
The challenge will be to determine which of these genes are dosage-sensitive and which tissues containing these common cancer CNVs will be susceptible to malignant transformation and growth. One approach is to characterize specific cancer CNVs in great detail, in terms of both population frequency and breakpoint sequence [24]. For example, in a pilot candidate-gene association study, we found a cancer CNV at the gene MLLT4 (a Ras target encoding a protein that regulates cell-cell adhesion) that seems to be associated with the Li-Fraumeni cancer predisposition disorder (LFS); individuals affected with LFS harbor a germline heterozygous mutation of the Tp53 tumor suppressor gene [18]. The frequency of this CNV is significantly increased in LFS (P = 0.006, Fisher's exact test): 3 of the 19 LFS probands (15.8%; observed/expected = 3/0.4 = 7.5) harbored the CNV duplication, whereas only 12 of 710 healthy individuals from the reference population (1.69%; observed/expected = 12/14.6 = 0.82) harbored the CNV.
A nice illustration of a focal CNV with phenotypic effect is given by the mitochondrial tumor suppressor gene (Mtus1); Frank et al. [25] found that a small deletion in Mtus1 is associated with a decreased risk of familial and high-risk breast cancer. Using long-range PCR, we independently finemapped this common cancer CNV and genotyped it in a panel of healthy controls. Although it is only 1.1 kb in size, the deletion removes an entire exon of Mtus1. Direct sequencing reveals a 41 bp stretch of homology flanking the exon, which leads to this deletion by non-allelic homologous recombination ( Figure 2).
These examples demonstrate hypothesis-driven approaches, which are restricted to genes for which there is an a priori association with cancer. Ultimately, it will be important to be able to discover and test every CNV in a genome for cancer susceptibility, but although this hypothesis-free approach is becoming technically tractable and more economical, such studies do have unique analytical challenges. As elaborated upon elsewhere [26,27] these challenges include: the unknown allele frequency and integer copy number of most CNVs, both within and among populations; the absence of sequence-level breakpoint information for most CNVs and the architectural complexity of some CNV regions, including smaller CNVs within larger ones [24].

R Ra ar re e c ca an nc ce er r C CN NV Vs s
Common cancer SNPs -and by analogy common cancer CNVs -each confer only a minor increase in disease risk, but collectively they may cause a substantially elevated risk. In contrast, the mutations associated with hereditary cancer syndromes are frequently highly penetrant on their own and are usually inherited in an autosomal dominant manner. Unlike low-penetrance alleles, rare high-penetrance mutations will almost always co-segregate with the disease in families.
There are over 200 cancer syndromes and although most arise infrequently, they account for 5-10% of all cancer cases [28]. These are caused by base-pair-sized germline mutations in many central tumor suppressor genes -such as TP53, APC, BRCA1, BRCA2, PTEN, and RB1 -and (fewer) oncogenes, including HRAS and RET.
The role of large structural mutations in cancer syndromes has been less appreciated, probably because genomic deletions or duplications are not readily detected by PCRbased sequencing. New multiplexing methods, especially multiplex ligation-dependent probe amplification (MLPA) [29], allow targeted copy number assessment of single gene or exon changes. This has led to a recent upsurge in discoveries of patients and families with rare pathogenic CNVs that strongly predispose to cancer. Of the 70 germline cancer genes in the Cancer Genes Census [30], 28 have been reported to be mutated by genomic deletion or duplication (the genes and citations are shown in Table 1). We hypothesize that many of the remaining gene mutations will be found to have a genomic equivalent and, perhaps more importantly, that predisposing CNVs will be found in other regions not usually associated with hereditary cancer. A recent report by Jackson et al. [31] describing five patients with rhabdoid predisposition syndrome and deletions at SMARCB1 (22q11.2) highlights the benefits of a global approach to CNV detection: using SNP arrays to gain a broad perspective on the SMARCB1 deletion and surrounding chromosomal landscape, it was found that the extent of two patients' deletions in fact extended past SMARCB1, impinging on neighboring genes, and explaining their clinical phenotype.
The presence of rare cancer CNVs leads to many questions: do they differ from base-pair changes at the same locus? What is their penetrance? What are the mutational processes that give rise to them? Do they have reciprocal deletions/duplications? Do they have long-range effects on gene expression? These questions provide fertile ground for F Fi ig gu ur re e 2 2 Cancer CNV breakpoint mapping. We mapped a 1.1 kb deletion in the mitochondrial tumor suppressor gene, MTUS1, to base-pair resolution. The affected portion of the gene is shown, including an exon (blue) that is deleted in the presence of the CNV. Two 41 bp repeats (with sequence AAATAAGAACCAAGTCCAAATACATCTTTGGAATGAAAGAG) were found at the breakpoints (red), while the sequence of the junction fragment is shown in the chromatogram. future research. These studies may involve identifying novel CNVs in unexplained familial clusterings of cancer, or the use of in vitro models in which cancer CNVs are created to measure their effect on cellular proliferation, genomic instability and the other hallmarks of cancer [32].
One potential model to explain the contribution of common and rare CNVs to cancer predisposition is shown in Figure 3.
We propose that the number of copy-number-variable regions in healthy persons is maintained by efficient DNA repair, while CNVs are more abundant in cancer-prone individuals because of germline defects in these processes. Although tumors are known to have increased somatic CNV and instability, our model suggests these alterations arise much earlier in cancer-predisposed individuals.
C CN NV Vs s a an nd d t tu um mo or r g ge en no om me es s So far we have focused here on CNVs and cancer predisposition, but similar high-resolution approaches have also driven recent studies on acquired (somatic) copy number alterations (CNAs) in tumor DNA. R Ra ar re e c ca an nc ce er r C CN NV Vs s a at t k kn no ow wn n c ca an nc ce er r--p pr re ed di is sp po os si in ng g g ge en ne es s C Co op py y n nu um mb be er r a al lt te er ra at ti io on ns s Genome-scale analyses have found many formerly invisible CNAs. In an analysis of 371 lung adenocarcinoma samples using a 250,000 probe array, Weir et al. [33] identified seven recurrent homozygous deletions and 24 recurrent amplifications. The most significant amplification, at 14q13.3 and containing the novel oncogene NKX2-1, had not been found in previous studies; because of insufficient resolution and sample size, the target gene it contained had not been identified. Using an even denser array, Mullighan et al. [34] profiled the DNA copy number changes of 242 pediatric acute lymphoblastic leukemia (ALL) patients, including 192 with B-progenitor leukemia (B-ALL) and 50 with T-lineage leukemia (T-ALL). Global differences between the subtypes' genomes and recurrent abnormalities at specific loci were identified. An average of six CNAs were found per leukemia genome, but significant differences in the number of CNAs were found within the B-ALL group and between the B-ALL and T-ALL subtypes. Intriguingly, in 30% of B-ALL patients, the authors [34] detected deletions of PAX5, a transcription factor that is expressed during early stages of B-cell development. Using CNA analysis to pinpoint critical genes can also help to plan subsequent sequencing efforts. For example, having identified deletions at PAX5, the authors [34] found that an additional 14 patients had point mutations in the same gene.
U Us si in ng g C CN NA As s t to o d de ef fi in ne e t th he e k ke ey y p pa at th hw wa ay ys s o of f a a t tu um mo or r In glioblastoma, CNA information, mRNA expression levels and methylation changes have been measured and nucleotide mutational analyses have been carried out [35]. Integrative analysis has shown that over 70% of tumors carry alterations in the retinoblastoma, p53 and receptor tyrosine kinase pathways. Although cancer is driven primarily by alterations of the genome, this study [35] and others have shown that CNA profiles can be combined with other highthroughput data to create insights that are 'greater than the sum of their parts'.
C Co on nc cl lu us si io on ns s a an nd d p pe er rs sp pe ec ct ti iv ve es s The study of cancer and CNVs is in its infancy but is maturing quickly. In considering the effect of this form of genetic variation on cancer predisposition, cancer gene expression and tumor genome profiling, there is much to learn from past studies on genomic disorders. Denser microarrays, next-generation sequencing and integrative informatics analyses are around the corner and promise to uncover new CNVs and CNAs.
There are, therefore, many exciting questions to be addressed: what role do CNVs have in cancer predisposition and how can we use this newly discovered form of genetic variation to identify those most at risk? Which cancerrelated genes are affected by CNVs and, of these changes, which are both necessary and sufficient to cause neoplastic growth? Can incipient cancer cells use these constitutional deletions and duplications to induce or accelerate tumorigenesis and tumor proliferation? As these questions are resolved, the potential value of cancer CNVs as novel biomarkers of cancer susceptibility and initiation, and of cancer progression and metastases, will become apparent. Whether cancer CNVs offer insight into genes that might be targets for novel drug development remains to be determined.
A Ab bb br re ev vi ia at ti io on ns s CNA, copy number alteration; CNV, copy number variation; DGV, Database of Genomic Variants; GWA, genome-wide association; LFS, Li-Fraumeni syndrome; SMS, Smith-Magenis syndrome; SNP, single nucleotide polymorphism.
C Co om mp pe et ti in ng g i in nt te er re es st ts s The authors declare that they have no competing interests.
A Au ut th ho or rs s' ' c co on nt tr ri ib bu ut ti io on ns s AS and DM contributed equally to the development of and writing of this manuscript and to approving the final draft.
A Ac ck kn no ow wl le ed dg ge em me en nt ts s