Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden
- Zachary R. Chalmers†1,
- Caitlin F. Connelly†1,
- David Fabrizio1,
- Laurie Gay1,
- Siraj M. Ali1,
- Riley Ennis1,
- Alexa Schrock1,
- Brittany Campbell4,
- Adam Shlien4,
- Juliann Chmielecki1,
- Franklin Huang2,
- Yuting He1,
- James Sun1,
- Uri Tabori4,
- Mark Kennedy1,
- Daniel S. Lieber1,
- Steven Roels1,
- Jared White1,
- Geoffrey A. Otto1,
- Jeffrey S. Ross1,
- Levi Garraway2, 3,
- Vincent A. Miller1,
- Phillip J. Stephens1 and
- Garrett M. Frampton1Email author
© The Author(s). 2017
Received: 8 September 2016
Accepted: 18 March 2017
Published: 19 April 2017
High tumor mutational burden (TMB) is an emerging biomarker of sensitivity to immune checkpoint inhibitors and has been shown to be more significantly associated with response to PD-1 and PD-L1 blockade immunotherapy than PD-1 or PD-L1 expression, as measured by immunohistochemistry (IHC). The distribution of TMB and the subset of patients with high TMB has not been well characterized in the majority of cancer types.
In this study, we compare TMB measured by a targeted comprehensive genomic profiling (CGP) assay to TMB measured by exome sequencing and simulate the expected variance in TMB when sequencing less than the whole exome. We then describe the distribution of TMB across a diverse cohort of 100,000 cancer cases and test for association between somatic alterations and TMB in over 100 tumor types.
We demonstrate that measurements of TMB from comprehensive genomic profiling are strongly reflective of measurements from whole exome sequencing and model that below 0.5 Mb the variance in measurement increases significantly. We find that a subset of patients exhibits high TMB across almost all types of cancer, including many rare tumor types, and characterize the relationship between high TMB and microsatellite instability status. We find that TMB increases significantly with age, showing a 2.4-fold difference between age 10 and age 90 years. Finally, we investigate the molecular basis of TMB and identify genes and mutations associated with TMB level. We identify a cluster of somatic mutations in the promoter of the gene PMS2, which occur in 10% of skin cancers and are highly associated with increased TMB.
These results show that a CGP assay targeting ~1.1 Mb of coding genome can accurately assess TMB compared with sequencing the whole exome. Using this method, we find that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy. Finally, we identify novel, recurrent promoter mutations in PMS2, which may be another example of regulatory mutations contributing to tumorigenesis.
KeywordsTumor mutational burden Cancer genomics Mismatch repair PMS2
In recent years, immunotherapies have shown great promise as treatments for skin, bladder, lung, and kidney cancers, and also for tumors which are mismatch repair deficient, with extremely durable responses for some patients [1–6]. These agents modulate the pathways that control when and where immune responses are mounted, increasing antitumor activity through immune checkpoint blockade . Inhibitors of cytotoxic T lymphocyte-associated antigen 4 (CTLA-4) [8, 9] and of programmed cell death protein 1 (PD-1) receptor  were the first drugs of this type, which promote T-cell activation . Other agents targeting immune checkpoint pathways are now approved or in active preclinical and clinical development [11–17].
While treating cancer with immunotherapy can be highly effective, only some patients respond to these treatments . Given the promise these agents have shown in treatment of refractory disease and the durable responses that occur in some cases, there is great interest in identifying patients who are most likely to derive benefit from these therapies. Assays that measure PD-1/PD-L1 protein expression by immunohistochemistry (IHC) are approved as complementary or companion diagnostics for some of these drugs; however, measurement of PD-1/PD-L1 expression is technically challenging, can be difficult to interpret, and is not always an accurate predictor of response to immunotherapy . An emerging biomarker for response to immunotherapy is the total number of mutations present in a tumor specimen. This is termed the mutation load or tumor mutational burden (TMB). It is hypothesized that highly mutated tumors are more likely to harbor neoantigens which make them targets of activated immune cells. This metric has been shown, in several tumor types, to correlate with patient response to both CTLA-4 and PD-1 inhibition [4, 20, 21]. In fact, in one clinical trial, TMB was more significantly associated with response rate than expression of PD-L1 by immunohistochemistry . Neoantigen load has also been correlated with response to immunotherapy . However, no recurrent neoantigens have been found that predict response to date .
Increased mutation rate is a well-characterized feature of human cancer. Abnormal activity in several cellular pathways, including DNA damage repair and DNA replication, can increase the overall rate of somatic mutations in tumors, as can exposure to mutagens such as ultraviolet light and tobacco smoke [24–28]. Defects in DNA damage repair lead to the accumulation of mutations caused by replicative errors and environmental damage [29, 30]. The core DNA mismatch repair protein complex is composed of two cooperative dimers: the PMS2 protein dimerizes with MLH1 to form the complex MutL-alpha, which cooperates with the MSH2-MSH6 dimer, MutS-alpha, to repair single base pair mismatches and small insertion–deletion loops [31–33]. Perturbations in mismatch repair gene expression, both loss and overexpression, can be deleterious to genomic stability [34–36], and loss of function mutations in mismatch repair pathway genes are known to correlate with high TMB in tumors [37–39]. As such, tumors with defective DNA repair mechanisms are more likely to benefit from immunotherapy .
Mutations in DNA damage repair proteins occur as both germline polymorphisms and de novo somatic mutations. Several hereditary cancer syndromes are the result of germline loss of function mutations in mismatch repair pathway genes [40, 41]. In Lynch syndrome, mutations in MSH2 and MLH1 are most often observed, with MSH6 and PMS2 mutations present in a minority of patients . In all cases, these germline variants lead to the loss of DNA damage repair activity and subsequent hypermutation. Typically, tumorigenesis in these cells occurs after loss of the single functional wild-type copy of the mutated gene. Somatic mutations in DNA mismatch repair genes produce a similar cellular phenotype to tumors with germline defects .
DNA replication is another key pathway in which defects can lead to increased somatic mutation rate. Recognition and removal of errors during replication are critical functions of DNA polymerases . POLD1 and POLE are involved in removal of errors during lagging- and leading-strand replication, respectively , and mutations in these genes can result in high TMB. The exonuclease domain in both genes is responsible for proofreading activity, and mutations in this domain are associated with hypermutation and tumorigenesis [45, 46]. Somatic loss of function mutations in POLE and POLD1 lead to hypermutation [47, 48]. Loss of TP53 DNA damage checkpoint activity, by somatic mutation, copy number loss, or epigenetic silencing, increases DNA damage tolerance and can also be associated with increased mutation frequency . Loss of function mutations in TP53 are very common in cancer and are a somatic marker of elevated mutation rate . Mutations in a number of other genes have also been linked to increased TMB [28, 51], but their function is less well understood. Further understanding the factors associated with increased TMB is important for better understanding this key driver of cancer progression and for understanding the molecular mechanisms which lead to high TMB.
Whole exome sequencing (WES) has been previously used to measure TMB, and TMB levels measured by WES and, in some cases, smaller gene panels have been shown to be associated with response to immunotherapy [52, 53]. The Cancer Genome Atlas (TCGA) project and several other studies have used WES to measure TMB across cancer types and found a wide distribution of TMB across ~20–30 cancer types [28, 51, 54]. Studies focusing on single disease types have shown that high TMB measured from whole exome data is associated with better response rates to immunotherapies in melanoma  and non-small cell lung cancer cohorts . Recent studies have also shown that TMB can be accurately measured in smaller gene assays encompassing several hundred genes and that looking at such a panel of genes, the same stratification of patient response based on TMB level exists for some indications [52, 53]. This suggests that a diagnostic assay targeting several hundred genes can accurately measure TMB and that these findings will be clinically actionable.
We sought to better understand the landscape of TMB across the spectrum of human cancer based on data from comprehensive genomic profiling (CGP) of more than 100,000 patient tumors of diverse type. Our analysis expands significantly upon existing data that quantify mutation burden in cancer [28, 51], providing data for many previously undescribed cancer types. We provide new data supporting rational expansion of the patient population that could benefit from immunotherapy and which will allow informed design of clinical trials of immunotherapy agents in untested cancer types. We identify somatically altered genes associated with significantly increased TMB and identify a novel mutation hotspot in the promoter of the PMS2 gene, which is mutated in ~10% of skin cancers and is associated with greatly increased TMB.
Comprehensive genomic profiling
CGP was performed using the FoundationOne assay (Cambridge, MA, USA), as previously described in detail [55, 56]. Briefly, the pathologic diagnosis of each case was confirmed by review of hematoxylin and eosin stained slides and all samples that advanced to DNA extraction contained a minimum of 20% tumor cells. Hybridization capture of exonic regions from 185, 236, 315, or 405 cancer-related genes and select introns from 19, 28, or 31 genes commonly rearranged in cancer was applied to ≥50 ng of DNA extracted from formalin-fixed, paraffin-embedded clinical cancer specimens. These libraries were sequenced to high, uniform median coverage (>500×) and assessed for base substitutions, short insertions and deletions, copy number alterations, and gene fusions/rearrangements . Data from all versions of the FoundationOne assay were used in the analysis. Hybridization capture baits for PMS2 are identical across all assay versions.
WES analysis of TCGA data
WES was performed on 29 samples as previously described  for which CGP had also been performed. Briefly, tumors were sequenced using Agilent’s exome enrichment kit (Sure Select V4; with >50% of baits above 25× coverage). The matched blood-derived DNA was also sequenced. Base calls and intensities from the Illumina HiSeq 2500 were processed into FASTQ files using CASAVA. The paired-end FASTQ files were aligned to the genome (to UCSC’s hg19 GRCh37) with BWA (v0.5.9) . Duplicate paired-end sequences were removed using Picard MarkDuplicates (v1.35) to reduce potential PCR bias. Aligned reads were realigned for known insertion/deletion events using SRMA (v0.1.155) . Base quality scores were recalibrated using the Genome Analysis Toolkit (v1.1-28) . Somatic substitutions were identified using MuTect (v1.1.4) . Mutations were then filtered against common single-nucleotide polymorphisms (SNPs) found in dbSNP (v132), the 1000 Genomes Project (Feb 2012), a 69-sample Complete Genomics data set, and the Exome Sequencing Project (v6500).
TCGA data were obtained from public repositories . For this analysis, we used the somatic called variants as determined by TCGA as the raw mutation count. We used 38 Mb as the estimate of the exome size. For the downsampling analysis, we simulated the observed number of mutations/Mb 1000 times using the binomial distribution at whole exome TMB = 100 mutations/Mb, 20 mutations/Mb, and 10 mutations/Mb and did this for megabases of exome sequenced ranging from 0–10 Mb. Melanoma TCGA data were obtained from dbGap accession number phs000452.v1.p1 .
From an initial clinical cohort of 102,292 samples, duplicate assay results from the same patient were excluded, and samples with less that 300× median exon coverage were excluded to make an analysis set of 92,439 samples. For analyses by cancer type, they must contain a minimum of 50 unique specimens following sample level filtering.
Tumor mutational burden
TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined. All base substitutions and indels in the coding region of targeted genes, including synonymous alterations, are initially counted before filtering as described below. Synonymous mutations are counted in order to reduce sampling noise. While synonymous mutations are not likely to be directly involved in creating immunogenicity, their presence is a signal of mutational processes that will also have resulted in nonsynonymous mutations and neoantigens elsewhere in the genome. Non-coding alterations were not counted. Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted, since our assay genes are biased toward genes with functional mutations in cancer . Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted . Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted. Known germline alterations in dbSNP were not counted. Germline alterations occurring with two or more counts in the ExAC database were not counted . To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory. The nonparametric Mann–Whitney U-test was subsequently used to test for significance in difference of means between two populations.
Microsatellite instability calling was performed on 62,150 samples, and analyses comparing MSI to TMB were limited to samples where both MSI status and TMB were determined.
To determine MSI status, 114 intronic homopolymer repeat loci with adequate coverage on the CGP panel were analyzed for length variability and compiled into an overall MSI score via principal components analysis.
The 114 loci were selected from a total set of 1897 that have adequate coverage on the FMI FoundationOne bait set. Amongst the 1897 microsatellites, the 114 that maximized variability between samples were chosen. Each chosen locus was intronic and had hg19 reference repeat length of 10–20 bp. This range of repeat lengths was chosen such that the microsatellites are long enough to produce a high rate of DNA polymerase slippage, while short enough such that they are well within the 49-bp read length of next-generation sequencing to facilitate alignment to the human reference genome. Translation of the MSI score to MSI-H or MSS (MSI-Stable) was established using a training data set.
Using the 114 loci, for each training sample the repeat length in each read that spans the locus was calculated. The means and variances of repeat lengths across the reads were recorded, forming 228 data points per sample. We then used principal components analysis to project the 228-dimension data onto a single dimension (the first principal component) that maximized the data separation, producing a next-generation sequencing-based “MSI score”. There was no need to extend beyond the first principal component, as it explained ~50% of the total data variance, while none of the other principal components explained more than 4% each. Ranges of the MSI score were assigned MSI-High (MSI-H), MSI-ambiguous, or microsatellite stable (MSS) by manual unsupervised clustering of specimens for which MSI status was previously assessed either via IHC if available or approximated by the number of homopolymer indel mutations detected by our standard pipeline.
Statistical association testing
To test for statistical association between genes and tumor mutation burden, we counted known and likely functional short variants in each gene, excluding mutations that occurred in homopolymers of length 6 or greater. We tested for association for all genes with six or more specimens with mutations that passed our filtering. We added a pseudo-count to each TMB value. We then fit a linear model of the type log10(TMB) ~ functional mutation status + disease type. We used the factor loading coefficient to determine the genes with the greatest effect size. This coefficient gives the change in log10(TMB) between samples with presence or absence of a functional mutation in that gene, while holding the disease type constant. We chose an effect size (factor loading) cutoff of 0.5, which when converted back from log space is equivalent to a 3.1-fold increase in TMB compared to wild-type TMB (3.6 mutations/Mb).
To test for association between alterations and tumor mutation burden, we tested all short variants occurring at a frequency of greater than 1 per 2000 specimens, excluding mutations that occurred in homopolymers of length 6 or greater and filtering out mutations present in dbSNP. We then fit a linear model, as above, of the type log10(TMB) ~ alteration status + disease type. For both tests, we corrected for multiple testing using the false discovery rate (FDR) method .
We tested for co-occurrence of functional gene mutations with PMS2 promoter mutations using logistic regression. We fit a model of the type: status of PMS2 promoter mutations in melanoma ~ gene functional mutation status + TMB. We then corrected for multiple testing using the FDR method .
TMB can be accurately measured by a targeted comprehensive genomic profiling assay
We also assessed the reproducibility of our method for calculating TMB using targeted CGP. For 60 samples for which CGP was performed more than once, we compared the TMB between replicates. We found that these values were highly correlated (R2 = 0.98), indicating that this method for measuring TMB has high precision (Fig. 1b).
We finally sought to determine the effects of sequencing different amounts of the genome and how that might affect our ability to accurately determine TMB. We sampled the number of mutations that we would expect to see at different TMB levels (100 mutations/Mb, 20 mutations/Mb, 10 mutations/Mb) and at different amounts of megabases sequenced, from 0.2 to 10 Mb, 1000 times for each TMB level and sequencing amount. For each sample, we then measured the percentage deviation from the whole exome TMB (Fig. 1c). We found that, as expected, the percentage deviation is lower for high underlying TMB, meaning that specimens with high TMB can be effectively identified by targeted sequencing of several hundred genes. In contrast, for intermediate levels of TMB, the percentage deviation starts to increase, especially with less than 0.5 Mb sequenced (Fig. 1c).
We also analyzed whole-exome sequencing data from 35 studies, published as part of TCGA, examining a total of 8917 cancer specimens . We determined the number of mutations in total and compared that to the number of mutations in the 315 genes targeted by our assay. As expected, these results were also highly correlated (R2 = 0.98). These results demonstrate that CGP targeting the entire coding region of several hundred genes can accurately assess whole exome mutational burden.
The landscape of mutation burden across cancer types
We next examined the landscape of TMB across the cohort of patients profiled in our laboratory. CGP was performed in the course of routine clinical care for 102,292 samples (see “Methods”). The unique patient cohort contained 41,964 male and 50,376 female patients. Median patient age at the time of specimen collection was 60 years (range <1 year to >89 years), and 2.5% of cases were from pediatric patients under 18 years old. This body of data provided 541 distinct cancer types for analysis. Notably, the majority of specimens were from patients with significantly pre-treated, advanced, and metastatic disease. Across the entire dataset, the median TMB was 3.6 mutations/Mb, with a range of 0–1241 mutations/Mb. This agrees well with previous estimates of mutation burden from whole exome studies [28, 51]. We found a significant increase in TMB associated with increased age (p < 1 × 10–16), though the effect size was small (Additional file 1: Figure S1). Median TMB at age 10 was 1.67 mutations/Mb, and median TMB at age 88 was 4.50 mutations/Mb. A linear model fit to the data predicted a 2.4-fold difference in TMB between age 10 and age 90, consistent with the median TMB differences at these ages. There was no statistically significant difference in median TMB between female and male patients (Additional file 2: Figure S2).
Disease indications with greater than 5% of specimens showing high TMB (>20 mutations/Mb)
Percentage cases with >20 mutations/Mb (95% CI)
Skin basal cell carcinoma
Skin squamous cell carcinoma (SCC)
Skin merkel cell carcinoma
Unknown primary melanoma
Head and neck melanoma
Lung large cell carcinoma
Unknown primary squamous cell carcinoma (SCC)
Lung large cell neuroendocrine carcinoma
Lung sarcomatoid carcinoma
Stomach adenocarcinoma intestinal type
Uterus endometrial adenocarcinoma endometrioid
Lymph node lymphoma diffuse large B cell
Lung non-small cell lung carcinoma (NOS)
Unknown primary sarcomatoid carcinoma
Unknown primary malignant neoplasm (NOS)
Uterus endometrial adenocarcinoma (NOS)
Bladder carcinoma (NOS)
Unknown primary urothelial carcinoma
Soft tissue angiosarcoma
Lung adenosquamous carcinoma
Skin adnexal carcinoma
Bladder urothelial (transitional cell) carcinoma
Lymph node lymphoma B-cell (NOS)
Lung squamous cell carcinoma (SCC)
Unknown primary carcinoma (NOS)
Head and neck squamous cell carcinoma (HNSCC)
Lung small cell undifferentiated carcinoma
Nasopharynx and paranasal sinuses squamous cell Carcinoma (SCC)
Ovary endometrioid adenocarcinoma
Unknown primary undifferentiated small cell carcinoma
Small intestine adenocarcinoma
Soft tissue malignant peripheral nerve sheath tumor (MPNST)
Soft tissue sarcoma undifferentiated
Uterus endometrial adenocarcinoma clear cell
Prostate undifferentiated carcinoma
Salivary gland mucoepidermoid carcinoma
Unknown primary adenocarcinoma
Ureter urothelial carcinoma
Cervix squamous cell carcinoma (SCC)
Penis squamous cell carcinoma (SCC)
salivary gland carcinoma (NOS)
Kidney urothelial carcinoma
Unknown primary undifferentiated neuroendocrine carcinoma
TMB and microsatellite instability
Identifying known genes and alterations associated with increased TMB
Genes associated with large increases in TMB include known DNA mismatch repair pathway genes (MSH2, MSH6, MLH1, PMS2) and DNA polymerases (POLE) (Fig. 4a–c). (Additional file 5: Table S2). Across the cohort, functional mutations in these mismatch repair genes and DNA polymerase occur in 13.5% of the cases with high TMB (858 cases with known functional mutations in mismatch repair or POLE out of the 6348 cases with high tumor mutation burden). Many of the mutations found were inactivating frameshift alterations, and MSH6 was the most frequently mutated (Additional file 6: Figure S4). We found mismatch repair mutations to be particularly common in skin squamous cell carcinoma (6.7%), uterus endometrial adenocarcinoma, subtype not otherwise specified; (6.0% of cases), and uterus endometrial adenocarcinoma endometrioid (5.8%). Our results are consistent with the known role of alterations in mismatch repair genes in leading to hypermutation.
In order to identify potential novel mutations associated with increased mutation rate, we also tested for association between TMB and all genomic alterations in our dataset (see “Methods”). We identified 117 somatic mutations significantly associated with increased tumor mutation burden at FDR = 0.05 and with factor loading >0.15 (Additional file 7: Table S3). As expected, many statistically significant mutations occurred in mismatch repair genes, and POLE P286R, a genomic alteration that is known to cause hyper-mutant cancers , was the second most significant (p = 1.1 × 10–72).
Novel promoter mutations in PMS2 are associated with high mutation burden and occur frequently in melanoma
These PMS2 promoter mutations occurred frequently in melanoma, in 10.0% of cases (173/1731). They were also found frequently in skin basal cell carcinoma (23%, 17/72 specimens) and skin squamous cell carcinoma (19%, 39/203 specimens) and less frequently in several other tumor types (Additional file 9: Table S5). We tested for co-occurrence of PMS2 promoter mutations with mutations in other genes in melanoma. After controlling for TMB (see “Methods”), we found that no other mutations significantly co-occurred (Additional file 10: Table S6).
To confirm that PMS2 promoter mutations were somatic in origin, we carried out several analyses. We first looked in TCGA whole exome data from 50 melanoma patients and confirmed the somatic status of three of the mutations found in our cohort (chr7:6048723, chr7:6048760, and chr7:6048824) . In this dataset, the frequency of the three PMS2 promoter mutations listed above is similar to the frequency of all PMS2 promoter mutations found in our data and significantly associated with TMB (4/50, 8.0%, 95% confidence interval (CI) 3.1–18.8%, and 10.0%, 95% CI 8.6–11.5%, respectively). We also queried public germline databases dbSNP142 and ExAC, and none of the PMS2 promoter mutations associated with high mutation burden were found in either database. Finally, we used an algorithm that uses the mutation allele frequency and genome-wide copy number model of genomic alterations to determine their germline or somatic origin (see “Methods”). We found that of the variants which were able to be called as somatic or germline, 274 of the variants out of 294 (93.1%) were called as somatic (Additional file 11: Table S7). Furthermore, the median allele frequency of PMS2 promoter mutations in melanoma is 0.26 (range 0.05–0.85), which is lower than that for BRAF V600 mutations occurring in the same tumor type (median 0.37, max 0.97; Additional file 12: Table S8). These data demonstrate that these PMS2 promoter mutations are most frequently somatic in origin. Finally, we used several computational methods to assess the functional impact of these mutations [68–70], using methods which integrate conservation information as well as multiple functional genomics data from ENCODE such as DNase I patterns and transcription factor binding (Additional file 13: Table S9). Interestingly, these methods agree in terms of which of the mutations we identified are most likely to be functional; chr7:6048760 and chr7:6048824 consistently had the most significant functional scores.
We have shown that tumor mutation burden calculated using a 1.1-Mb CGP assay agrees well with whole exome measures of mutation burden. This indicates that CGP, targeting the entire coding region of several hundred genes, covers sufficient genomic space to accurately assess whole exome mutational burden. We found that filtering out germline alterations and rare variants was important to obtaining accurate measurements of TMB, and this will especially be important in patients from ethnic backgrounds not well represented in sequencing datasets. These findings indicate that CGP is an accurate, cost-effective, and clinically available tool for measuring TMB. The results of our downsampling analysis show that the variation in measurement due to sampling when sequencing 1.1 Mb is acceptably low, resulting in highly accurate calling of TMB at a range of TMB levels. This sampling variation increases as the number of megabases sequenced decreases, especially at lower levels of TMB. While targeted CGP can be used to accurately assess TMB, it is not currently suited for identification of neoantigens, which might occur in any gene.
We characterized and provide extensive data describing tumor mutational burden across more than 100,000 clinical cancer specimens from advanced disease, including many previously undescribed types of cancer. These data should help to guide design of immunotherapy clinical trials across a broader range of indications. Currently, immunotherapies targeting CTLA-4, PD-1, and PD-L1 are approved in a small number of indications, melanoma, bladder, NSCLC, and renal cell carcinoma. Not surprisingly, we observe that melanoma and NSCLC represent some of the highest mutation burden indications. We identified several novel disease types with high TMB which may be good targets for immuno-oncology treatment development. In addition, we observed a wide range of TMB across many cancer types, similar to findings from previous studies [28, 51]. We have found that there may be many disease types with a substantial portion of patients who might benefit from these therapies. Overall, we identified 20 tumor types affecting eight tissues where greater than 10% of patients had high TMB.
Understanding the factors associated with genomic instability is also important to better understand carcinogenesis and progression. We characterized the distribution and prevalence of coding mutations in known genes involved in mismatch repair and DNA replication. However, overall mutations in these genes accounted for less than 10% of cases with high TMB. We also identified several other genes associated with high TMB. Alterations in TOP2A were associated with a large increase in TMB, although we only identified eight cases of single nucleotide substitutions in this gene. TP53BP1, another of the genes showing large effect size, is involved in double-stranded break repair and also implicated in resistance mechanisms [71, 72].
Non-coding mutations have increasingly been found to have a functional role in cancer [73–75]. Our analysis of mutations that are significantly associated with increased tumor mutation burden resulted in the discovery of novel recurrent mutations in the promoter region of mismatch repair pathway gene PMS2. We have not definitively shown that these mutations are causal, and additional experiments will be needed to elucidate the function of these promoter mutations. PMS2 promoter mutations are present in ~10% of melanoma samples and ~8% of squamous cell carcinomas, meaning that, if functional, these mutations may comprise a meaningful subset of alterations in both of these diseases.
These results show that CGP targeting ~1.1 Mb of coding genome can accurately assess TMB compared with sequencing the whole exome. Using this method, we find that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy. Finally, we identify novel, recurrent promoter mutations in PMS2 which may be another example of regulatory mutations contributing to tumorigenesis.
Comprehensive genomic profiling
False discovery rate
Non-small cell lung cancer
The cancer genome atlas
Tumor mutational burden
Whole exome sequencing
Funding was provided by Foundation Medicine, Inc.
Availability of data and materials
The data are not publicly available due to them containing information that could compromise research participant privacy.
ZRC and CFC analyzed the data and wrote the manuscript. BC, AS, UT, and DF produced and analyzed the whole exome data. YH and JS contributed to MSI stability analysis. MK, DSL, SR, JW, and GAO carried out initial data analysis. DF, LG, SMA, RE, AS, JC, FH, JSR, LG, VAM, and PJS helped write and contributed to the manuscript. GMF conceived of the study and edited the manuscript. All authors read and approved the manuscript.
Employees of Foundation Medicine: ZRC, CFC, DF, LG, SMA, RE, AS, JC, JS, YH, MK, DSL, SR, JW, GAO, JSR, VAM, PJS, GMF. The remaining authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Approval for this study, including a waiver of informed consent and a HIPAA waiver of authorization, was obtained from the Western Institutional Review Board (protocol number 20152817).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Mellman I, Coukos G, Dranoff G. Cancer immunotherapy comes of age. Nature. 2011;480:480–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Topalian SL, Hodi FS, Brahmer JR, Gettinger SN, Smith DC, McDermott DF, et al. Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N Engl J Med. 2012;366:2443–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Bracarda S, Altavilla A, Hamzaj A, Sisani M, Marrocolo F, Del Buono S, et al. Immunologic checkpoints blockade in renal cell, prostate, and urothelial malignancies. Semin Oncol. 2015;42:495–505.View ArticlePubMedGoogle Scholar
- Le DT, Uram JN, Wang H, Bartlett BR, Kemberling H, Eyring AD, et al. PD-1 Blockade in tumors with mismatch-repair deficiency. N Engl J Med. 2015;372:2509–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Motzer RJ, Escudier B, McDermott DF, George S, Hammers HJ, Srinivas S, et al. Nivolumab versus everolimus in advanced renal-cell carcinoma. N Engl J Med. 2015;373:1803–13.View ArticlePubMedGoogle Scholar
- Rosenberg JE, Hoffman-Censits J, Powles T, van der Heijden MS, Balar AV, Necchi A, et al. Atezolizumab in patients with locally advanced and metastatic urothelial carcinoma who have progressed following treatment with platinum-based chemotherapy: a single-arm, multicentre, phase 2 trial. Lancet. 2016;387:1909–20.View ArticlePubMedGoogle Scholar
- Pardoll DM. The blockade of immune checkpoints in cancer immunotherapy. Nat Rev Cancer. 2012;12:252–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen L, Ashe S, Brady WA, Hellstrom I, Hellstrom KE, Ledbetter JA, et al. Costimulation of antitumor immunity by the B7 counterreceptor for the T lymphocyte molecules CD28 and CTLA-4. Cell. 1992;71:1093–102.View ArticlePubMedGoogle Scholar
- Leach DR, Krummel MF, Allison JP. Enhancement of antitumor immunity by CTLA-4 blockade. Science. 1996;271:1734–6.View ArticlePubMedGoogle Scholar
- Hirano F, Kaneko K, Tamura H, Dong H, Wang S, Ichikawa M, et al. Blockade of B7-H1 and PD-1 by monoclonal antibodies potentiates cancer therapeutic immunity. Cancer Res. 2005;65:1089–96.PubMedGoogle Scholar
- Brignone C, Gutierrez M, Mefti F, Brain E, Jarcau R, Cvitkovic F, et al. First-line chemoimmunotherapy in metastatic breast carcinoma: combination of paclitaxel and IMP321 (LAG-3Ig) enhances immune responses and antitumor activity. J Transl Med. 2010;8:71.View ArticlePubMedPubMed CentralGoogle Scholar
- Soliman HH, Jackson E, Neuger T, Dees EC, Harvey RD, Han H, et al. A first in man phase I trial of the oral immunomodulator, indoximod, combined with docetaxel in patients with metastatic solid tumors. Oncotarget. 2014;5:8136–46.View ArticlePubMedPubMed CentralGoogle Scholar
- Calabro L, Ceresoli GL, di Pietro A, Cutaia O, Morra A, Ibrahim R, et al. CTLA4 blockade in mesothelioma: finally a competing strategy over cytotoxic/target therapy? Cancer Immunol Immunother. 2015;64:105–12.View ArticlePubMedGoogle Scholar
- Castro MP, Goldstein N. Mismatch repair deficiency associated with complete remission to combination programmed cell death ligand immune therapy in a patient with sporadic urothelial carcinoma: immunotheranostic considerations. J Immunother Cancer. 2015;3:58.View ArticlePubMedPubMed CentralGoogle Scholar
- Sunshine J, Taube JM. PD-1/PD-L1 inhibitors. Curr Opin Pharmacol. 2015;23:32–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Ibrahim R, Stewart R, Shalabi A. PD-L1 blockade for cancer treatment: MEDI4736. Semin Oncol. 2015;42:474–83.View ArticlePubMedGoogle Scholar
- Zhai L, Spranger S, Binder DC, Gritsina G, Lauing KL, Giles FJ, et al. Molecular pathways: targets IDO1 and other tryptophan dioxygenases for cancer immunotherapy. Clin Cancer Res. 2015;21:5427–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Prieto PA, Yang JC, Sherry RM, Hughes MS, Kammula US, White DE, et al. CTLA-4 blockade with ipilimumab: long-term follow-up of 177 patients with metastatic melanoma. Clin Cancer Res. 2012;18:2039–47.View ArticlePubMedPubMed CentralGoogle Scholar
- Topalian SL, Taube JM, Anders RA, Pardoll DM. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat Rev Cancer. 2016;16:275–87.View ArticlePubMedPubMed CentralGoogle Scholar
- Rizvi NA, Hellmann MD, Snyder A, Kvistborg P, Makarov V, Havel JJ, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348:124–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Snyder A, Makarov V, Merghoub T, Yuan J, Zaretsky JM, Desrichard A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371:2189–99.View ArticlePubMedPubMed CentralGoogle Scholar
- Van Rooij N, van Buuren MM, Philips D, Velds A, Toeves M, Heemskerk B, et al. Tumor exome analysis reveals neoantigen-specific T-cell reactivity in an ipilimumab-responsive melanoma. JCO. 2013;31:e439–42.View ArticleGoogle Scholar
- Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA4 blockade in metastatic melanoma. Science. 2015;350:207–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Hainaut P, Hollstein M. p53 and human cancer: the first ten thousand mutations. Adv Cancer Res. 2000;77:81–137.View ArticlePubMedGoogle Scholar
- Denissenko MF, Pao A, Tang M, Pfeifer GP. Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in P53. Science. 1996;274:430–2.View ArticlePubMedGoogle Scholar
- Alexandrov LB, Ju YS, Haase K, Van Loo P, Martincorena I, Nik-Zainal F, et al. Mutational signatures associated with tobacco smoking in human cancer. Science. 2016;354:618–22.View ArticlePubMedGoogle Scholar
- Brash DE, Rudolph JA, Simon JA, Lin A, McKenna GJ, Baden HP, et al. A role for sunlight in skin cancer: UV-induced p53 mutations in squamous cell carcinoma. Proc Natl Acad Sci U S A. 1991;88:10124–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–21.View ArticlePubMedPubMed CentralGoogle Scholar
- McMurray CT, Tainer JA. Cancer, cadmium, and genome integrity. Nat Genet. 2003;34:239–41.View ArticlePubMedGoogle Scholar
- Jackson SP, Bartek J. The DNA-damage response in human biology and disease. Nature. 2009;461:1071–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Martin SA, Lord CJ, Ashworth A. Therapeutic targeting of the DNA mismatch repair pathway. Clin Cancer Res 2010; N Engl J Med:5107–13.
- Modrich P. Mismatch repair, genetic stability, and cancer. Science. 1994;266:1959–60.View ArticlePubMedGoogle Scholar
- Prolla TA, Pang Q, Alani E, Kolodner RD, Liskay RM. MLH1, PMS1, and MSH2 interactions during the initiation of DNA mismatch repair in yeast. Science. 1994;265:1091–3.View ArticlePubMedGoogle Scholar
- Gibson SL, Narayanan L, Hegan DC, Buermeyer AB, Liskay RM, Glazer PM. Overexpression of the DNA mismatch repair factor, PMS2, confers hypermutability and DNA damage tolerance. Cancer Lett. 2006;244:195–202.
- Qin X, Liu L, Gerson SL. Mice defective in the DNA mismatch gene PMS2 are hypersensitive to MNU induced thymic lymphoma and are partially protected by transgenic expression of human MGMT. Oncogene. 1999;18:4394–400.View ArticlePubMedGoogle Scholar
- Thibodeau SN, French AJ, Roche PC, Cunningham JM, Tester DJ, Lindor NM. Altered expression of hMSH2 and hMLH1 in tumors with microsatellite instability and genetic alterations in mismatch repair genes. Cancer Res. 1996;56:4836–40.PubMedGoogle Scholar
- Duval A, Hamelin R. Mutations at coding repeat sequences in mismatch repair-deficient human cancers toward a new concept of target genes for instability. Cancer Res. 2002;62:2447–54.PubMedGoogle Scholar
- Peltomäki P. Role of DNA mismatch repair defects in the pathogenesis of human cancer. JCO. 2003;21:1174–9.View ArticleGoogle Scholar
- Zysman M, Saka A, Millar A, Knight J, Chapman W, Bapat B. Methylation of adenomatous polyposis coli in endometrial cancer occurs more frequently in tumors with microsatellite instability phenotype. Cancer Res. 2002;62:3663–6.PubMedGoogle Scholar
- Lynch HT, Lynch J. Lynch syndrome: genetics, natural history, genetic counseling, and prevention. J Clin Oncol. 2000;18:19S–31.PubMedGoogle Scholar
- Miyaki M, Nishio J, Konishi M, Kikuchi-Yanoshita R. Drastic genetic instability of tumors and normal tissues in Turcot syndrome. Oncogene. 1997;15:2877–81.View ArticlePubMedGoogle Scholar
- Nagy R, Sweet K, Eng C. Highly penetrant hereditary cancer syndromes. Oncogene. 2004;23:6445–70.View ArticlePubMedGoogle Scholar
- Mensenkamp AR, Vogelaar IP, van Zelst-Stams WA, Goossens M, Ouchene H, Hendriks-Cornelissen SJ, et al. Somatic mutations in MLH1 and MSH2 are a frequent cause of mismatch-repair deficiency in Lynch syndrome-like tumors. Gastroenterology. 2014;146:643–6. e8.View ArticlePubMedGoogle Scholar
- Pursell ZF, Isoz I, Lundström EB, Johansson E, Yeast KTA, DNA. Polymerase ε participates in leading-strand DNA replication. Science. 2007;317:127–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Church DN, Briggs SE, Palles C, Domingo E, Kearsey SJ, Grimes JM, et al. DNA polymerase ε and δ exonuclease domain mutations in endometrial cancer. Hum Mol Gen. 2013;22:2820–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Palles C, Cazier JB, Howarth KM, Domingo E, Jones AM, Broderick P, et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet. 2013;45:136–44.View ArticlePubMedGoogle Scholar
- Lange SS, Takata K, Wood RD. DNA polymerases and cancer. Nat Rev Cancer. 2011;11:96–110.View ArticlePubMedPubMed CentralGoogle Scholar
- Briggs S, Tomlinson I. Germline and somatic polymerase ε and δ mutations define a new class of hypermutated colorectal and endometrial cancers. J Pathol. 2013;230:148–53.View ArticlePubMedPubMed CentralGoogle Scholar
- Negroni M, Buc H. Retroviral recombination: what drives the switch? Nat Rev Mol Cell Bio. 2001;2:151–5.View ArticleGoogle Scholar
- Petitjean A, Mathe E, Kato S, Ishioka C, Tavtigian SV, Hainaut P, et al. Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum Mutat. 2007;28:622–9.View ArticlePubMedGoogle Scholar
- Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Campesato LF, Barroso-Sousa R, Jimenez L, Correa BR, Sabbaga J, Hoff PM, et al. Oncotarget. 2015;6:34221–7.PubMedPubMed CentralGoogle Scholar
- Johnson DB, Frampton GM, Rioth MJ, Yusko E, Xu Y, Guo X, et al. Targeted next generation sequencing identifies markers of response to PD-1 blockade. Cancer Immunol Res. 2016;4:959–67.View ArticlePubMedGoogle Scholar
- Network CGAR, Weinstein JN, Colisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–20.View ArticleGoogle Scholar
- Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol. 2013;31:1023–31.View ArticlePubMedGoogle Scholar
- He J, Abdel-Wahab O, Nahas MK, Rampal RK, Intlekofer AM, Patel J, et al. Integrated genomic DNA/RNA profiling of hematologic malignancies in the clinical setting. Blood. 2016;127:3004–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Shlien A, Campbell BB, de Borja R, Alexandrov LB, Merico D, Wedge D, et al. Combined hereditary and somatic mutations of replication error repair genes result in rapid onset of ultra-hypermutated cancers. Nat Genet. 2015;47:257–62.View ArticlePubMedGoogle Scholar
- Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Homer N, Nelson SF. Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol. 2010;11:R99.View ArticlePubMedPubMed CentralGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. Genome Res. 2010;20:1297–303.View ArticlePubMedPubMed CentralGoogle Scholar
- Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213-9.
- Berger MF, Hodis E, Heffernan TP, Deribe YL, Lawrence MS, Protopopov A, et al. Melanoma genome sequencing reveals frequent PREX2 mutations. Nature. 2012;485:502–6.PubMedPubMed CentralGoogle Scholar
- Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355–8.PubMedPubMed CentralGoogle Scholar
- Sun JX, Frampton G, Wang K, Ross JS, Miller VA, Stephens PJ, et al. A computational method for somatic versus germline variant status determination from targeted next-generation sequencing of clinical cancer specimens without a matched normal control. Cancer Res. 2014;74(19S):1893.View ArticleGoogle Scholar
- Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91.View ArticlePubMedPubMed CentralGoogle Scholar
- Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Kane DP, Shcherbakova PV. A common cancer-associated DNA polymerase ε mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res. 2014;74:1895–901.View ArticlePubMedPubMed CentralGoogle Scholar
- Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015;31:1536–43.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12:931–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang YH, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet. 2017.
- Wilson MD, Benlekbir S, Fradet-Turcotte A, Sherker A, Julien JP, McEwan A, et al. The structural basis of modified nucleosome recognition by 53BP1. Nature. 2016;536:100–3.View ArticlePubMedGoogle Scholar
- Ochs F, Somyajit K, Altmeyer M, Rask MB, Lukas J, Lukas C. 53BP1 fosters fidelity of homology-directed DNA repair. Nat Struct Mol Biol. 2016;23:714–21.View ArticlePubMedGoogle Scholar
- Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA. Highly recurrent TERT promoter mutations in human melanoma. Science. 2013;339:957–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Melton C, Reuter JA, Spacek DV, Snyder M. Nat Genet. 2015;47:710–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Kataoka K, Shiraishi Y, Takeda Y, Sakata S, Matsumoto M, Nagano S, et al. Aberrant PD-L1 expression through 3′-UTR disruption in multiple cancers. Nature. 2016;534:402–6.View ArticlePubMedGoogle Scholar