Environmental exposures and mutational patterns of cancer genomes

The etiology of most human cancers is unknown. Genetic inheritance and environmental factors are thought to have major roles, and for some types of cancer, exposure to carcinogens is a proven mechanism leading to tumorigenesis. Sequencing of entire cancer genomes has not only begun to provide clues regarding functionally relevant mutations, but has also paved the way towards understanding the initial exposures leading to DNA damage, repair and eventually to mutation of specific sequences within a cancer genome. Two recent studies of melanoma and small cell lung cancer exemplify what type of information can be gained from cancer genome sequencing.

The origins of human cancers have environmental and hereditary components. Germline mutations of tumor suppressor genes found in cancer predisposition syn dromes are prominent examples of inheritance and include well known tumor suppressor genes, such as the retinoblastoma gene (RB), TP53, the breast cancer genes BRCA1 and BRCA2, the adenomatous polyposis coli gene APC, the mismatch repair genes MLH1 and MSH2 and a few others. Although mutations in these genes are very rare in the general population, they confer a high risk for developing the disease. Mutations in this group of genes account for only a small fraction of the excess cancer incidence in familial cancer. For some common cancers with significant aspects of heritability, such as prostate cancer, highly penetrant susceptibility genes are still unknown. For these reasons, attention has now shifted towards ascribing much of the observed familial cancer risk to polygenic models of predisposition in which variant alleles, each conferring a small added risk, cooperate to produce a significant risk factor if several of the adverse alleles are inherited. Many of the high and moderaterisk genetic mutations conferring enhanced cancer susceptibility in families occur in DNA repair genes, or DNA damage response genes in general, suggest ing that some form of DNA damage or replication abnormality is often at the root of cancer initiation.
Recently, genomewide association studies (GWASs) have provided large datasets for the identification of low penetrance genes responsible for enhanced cancer susceptibility in the general population. Most of the major cancers have now been investigated by GWASs and close to 100 new cancersusceptibility loci have been identified [1]. For some cancers with strong environ mental components, such as lung cancer, only a few signi ficant loci were found as a result of the overwhelming effect of cigarette smoking on cancer risk, but other cancer types (such as prostate cancer) have yielded many (over 20) such loci. Although GWASs are now capturing the excitement of the cancer genetics community and numerous highprofile studies with large sample sizes and everincreasing genome coverage are being published, it should not be forgotten that the majority of the cancer risk is thought to be nongenetic (the risk is due to the environment) and this is true for the major human cancers, including prostate cancer, breast cancer and colorectal cancer, for which the heritability accounts for 42%, 27% and 35% of the phenotypic variance, respectively [2]. Thus, in most common cancers, environ mental factors supersede the role of genetic inheritance.
Unfortunately, environmental components have convincingly been linked to human cancer in only a few select cases. Most notable are skin cancers associated with sunlight exposure and lung cancer associated with cigarette smoking. Over many decades, epidemiological and molecular studies have established and confirmed this link. Nonmelanoma skin cancer is found on sun exposed skin, and melanoma has been linked with intermittent or recreational sun exposure, in particular in early childhood [3]. Although the ultraviolet B (UVB, 290 to 320 nm) component of sunlight has generally been implicated in these cancers, a role of UVA (320 to 400 nm) cannot currently be excluded.

Abstract
The etiology of most human cancers is unknown. Genetic inheritance and environmental factors are thought to have major roles, and for some types of cancer, exposure to carcinogens is a proven mechanism leading to tumorigenesis. Sequencing of entire cancer genomes has not only begun to provide clues regarding functionally relevant mutations, but has also paved the way towards understanding the initial exposures leading to DNA damage, repair and eventually to mutation of specific sequences within a cancer genome. Two recent studies of melanoma and small cell lung cancer exemplify what type of information can be gained from cancer genome sequencing.

TP53 mutations
A breakthrough in cancer etiology research has been the demonstration of exposurespecific mutational finger prints in the TP53 tumor suppressor gene and in a few other genes that are found mutated in human tumors at a substantial frequency [4,5]. These studies of TP53 mutations have found UVBspecific mutations, CtoT transitions at dipyrimidine sequences and CCtoTT tandem mutations as hallmarks of sunlight exposure leading to nonmelanoma skin cancers [6]. The CCto TT tandem mutations in TP53 are almost never found in human tumors not related to sunlight. Similarly, GtoT transversions, which are particularly enriched at methy lated CpG (mCpG) dinucleotide sequences in TP53, are characteristic for smokingassociated lung cancers and are much less frequent in lung cancers of nonsmokers, or in other cancers not related to smoking [7]. The mCpGassociated GtoT transversions have been linked to one prominent class of cigarette smoke carcinogens, the polycyclic aromatic hydrocarbons, which have strong selectivity for forming DNA lesions at exactly these DNA sequences [8] and for inducing the same type of muta tional events in in vitro systems. This mechanistically strengthens the link between smoking and lung cancer [7].

Insights from whole-genome sequencing of human cancers
Moving beyond mutational studies of important cancer relevant genes, such as TP53, it is now possible to conduct highthroughput sequencing of cancer genomes. Initial reports focusing on sequencing a large number of coding exons have been performed on several types of human cancer, including lung cancer [9]. This year, two articles in Nature have expanded our knowledge of environmental carcinogenesis by determining the sequence of the entire genomes of a small cell lung cancer (SCLC) and a melanoma cell line [10,11].
In the first study, the authors [10] sequenced the genome of a melanoma cell line using Illumina short sequence read technology. They identified over 30,000 base substitutions relative to a lymphoblastoid cell line from the same patient and various other events, including insertions, deletions, copy number changes and rearrangements. This study is the first comprehensive analysis of a solid tumor genome. Although definitive novel driver mutations in potential cancerrelevant genes were not identified from this single sample, the results gave important clues to the etiology and mechanistic history of how the mutations have arisen as a conse quence of UVinduced DNA damage. By far the most common mutation was the CtoT transition event, accounting for more than twothirds of all mutations (Figure 1). A total of 92% of the CtoT mutations occurred at the 3' base of a pyrimidine dinucleotide, much higher than expected by chance. These mutations are characteristic of UVBinduced DNA damage [12]. The frequency of CtoT and CCtoTT mutations due to sunlight exposure is also known to be higher at CpG dinucleotides [12]. CtoT substitutions (7.7%) and CC toTT double substitutions (10.0%) both showed elevated frequencies at CpG dinucleotides compared with that expected by chance (4.4%). Therefore, the mutation spectrum and sequence context indicate that most CtoT somatic substitutions in the melanoma cell line can be attributed to ultravioletlightinduced DNA damage.
The mutational landscape of this melanoma cell line is also shaped by DNA repair processes [10]. Nucleotide excision repair is the repair pathway responsible for removing UVinduced pyrimidine dimers. A specialized mechanism of transcriptioncoupled nucleotide excision repair removes pyrimidine dimers preferentially from active genes and specifically from the transcribed strand of active genes. This repair activity was reflected in the   distribution of CtoT and CCtoTT mutations in the melanoma genome, in which these types of mutations were more prominent on the nontranscribed DNA strand of active genes. Genes expressed at a high level showed a lower frequency of somatic mutations than genes expressed at a low level, on both the transcribed and nontranscribed strands. The authors [10] also reported lower mutation prevalence in exons than in introns, but this could be due to negative selection of coding sequence mutations. The second study [11] focused on a SCLC genome. The authors [11] used the ABI SoliD sequencing platform to generate matepair shotgun sequences at more than 30x coverage of the tumor genome and a normal B lympho cyte reference genome from the same individual. This was the first wholegenome sequence of a human lung cancer specimen. Almost 23,000 somatic mutations were identified. The enormous statistical power of this dataset, not affected by selection, gave an elaborate picture of a mutational landscape sculpted by tobacco carcinogen exposure, its sequence preference and several types of DNA repair pathways. As with other similar studies, the fraction of nonsynonymous substitutions within protein coding sequences of the cancer genome was not very different from that expected from random events. This means that many tumor genomes will need to be sequenced to identify true tumordriving mutations. In the SCLC genome, obtained from a type of cancer almost always associated with tobacco smoking, GtoT trans versions were the most frequent changes observed (34% of all mutations; Figure 1). This frequency is remarkably similar to the pattern of substitutions observed in the TP53 tumor suppressor gene in SCLC cases collected from the International Agency for Research on Cancer TP53 mutation database [7] and suggests the involvement of tobacco carcinogens in mutation induction [13]. CpG dinucleotides were significantly enriched in the GtoT mutation set compared with controls. This, again, is consistent with the TP53 mutational spectra of smokers' lung cancer. GtoC transversions were more enriched in unmethylated compartments of the genome and were often adjacent to A, that is, they occurred in the GpA sequence context. The origin of such specific GtoC mutations is currently unknown but they have also been observed in other tumor types [9]. In keeping with what is known about GtoT transversions in TP53 and about transcriptioncoupled and strandspecific repair of bulky carcinogen DNA adducts, the authors [11] found that GtoT transversions were strongly targeted to the non transcribed DNA strand of active genes. Significantly lower mutation prevalence, on both transcribed and non transcribed DNA strands, was observed in more highly expressed genes for GtoT and also for other types of mutations, suggesting that, in addition to the strandspecific repair pathway, a repair pathway exists that preferentially removes lesions from both strands of active genes [11].
Recently, Lee et al. [14] analyzed the genome of a lung adenocarcinoma using highthroughput sequencing by unchained combinatorial probe anchor ligation chem istry on selfassembling DNA nanoarrays. They found over 50,000 highconfidence single nucleotide variations in the tumor relative to normal lung. In this study as in the others [10,11], transversions at guanine (G to T) were the most common events (46% of all muta tions), attesting to the role of tobacco carcinogens in shaping the mutational patterns in this tumor.

Conclusions
The data presented in these reports [10,11] show the power of wholegenome sequencing to characterize at unprecedented levels of resolution and sequence coverage the many complex mutational signatures found in human cancers induced by environmental exposures. It is expected that additional wholecancergenome sequencing datasets will be forthcoming that will cover the same tumor type (to address interindividual variation or different histological subtypes of cancer). Other cancers for which an environmental origin is known or suspected for example, aflatoxinassociated liver cancer will be extremely important to analyze. Furthermore, it is hoped that wholegenome mutational spectra for cancers of unknown etiology for example, breast or pancreatic cancer will bring forward new hypotheses regarding potential agents that have com patible mutational specificity and should further be investigated as causative agents of human cancer.