The translation of cancer genomics: time for a revolution in clinical cancer care

The introduction of next-generation sequencing technologies has dramatically impacted the life sciences, perhaps most profoundly in the area of cancer genomics. Clinical applications of next-generation sequencing and associated methods are emerging from ongoing large-scale discovery projects that have catalogued hundreds of genes as having a role in cancer susceptibility, onset and progression. For example, discovery cancer genomics has confirmed that many of the same genes are altered by mutation, copy number gain or loss, or structural variation across multiple tumor types, resulting in a gain or loss of function that likely contributes to cancer development in these tissues. Beyond these frequently mutated genes, we now know there is a ‘long tail’ of less frequently mutated, but probably important, genes that play roles in cancer onset or progression. Here, I discuss some of the remaining barriers to clinical translation, and look forward to new applications of these technologies in cancer care.


Introduction
In 2004, following the completion of the Human Genome Project [1], a correlation was described between a specific set of mutations and patients whose lung cancers had dramatically responded to a new class of therapeutics called tyrosine kinase inhibitors (TKIs) [2]. Our group and two others [2][3][4] found that about 80% of patients with a response to a TKI carried somatic mutations in the gene encoding the epidermal growth factor receptor (EGFR) resulting in changes to the amino acid sequence. Those efforts involved low-throughput technology that combined PCR amplification of the EGFR exons with fluorescent capillary electrophoresis-based sequencing, and yet were revealing in that they linked a specific gene and its Correspondence: emardis@wustl.edu The Genome Institute, Washington University School of Medicine, 4444 Forest Park Blvd, St Louis, MO 63108, USA mutations to a specific therapy and to a patient's response. Exciting as these discoveries were just 8 years ago, today the first-line therapy and standard of care for non-smallcell lung cancer patients with mutated EGFR is to prescribe a TKI. While these mutations occur in only about 10% of lung adenocarcinoma patients, subsequently other genes with corresponding targeted therapies have been identified in lung cancer. Furthermore, a large number of driver alterations have been identified in lung adenocarcinoma, a significant proportion of which can be treated with specific drug-gene combinations, and there are many more drivers to be identified [5].
In the ensuing 8 years, DNA sequencing technology has taken a dramatic and welcome turn away from those expensive and slow methods of 2004, to so-called 'nextgeneration' or 'massively parallel' technologies that can deliver DNA sequencing data for a complete human genome overnight. The result of this technology acceleration, and the accompanying types of sophisticated analyses that can be applied to cancer genome data, have transformed our understanding of the numbers and types of somatic alterations that drive cancer development [6]. Importantly, not only can rapid DNA sequencing now quickly identify mutated genes, but these can be matched to increasing numbers of protein-targeted therapies, creating a virtuous circle that is beginning to revolutionize clinical cancer care. However, there are issues that complicate the application of our new knowledge to the care of cancer patients. These issues revolve around the difficulties of adding new methods to already established standards of care in medicine, the need to train physicians across the cancer care discipline regarding how best to apply these new data to clinical practice, and how to make the enhanced information from genomics 'fit' into a drug approval process that has, until now, only dealt with drugs targeting single disease sites.

Debatable issues for translation
A multitude of questions remain unanswered in the clinical translation of next-generation sequencing (NGS) for therapeutic decision-making. Questions range from how comprehensively the cancer genome should be characterized to identify mutated genes that may result in response to a drug, to more social issues regarding physician education and sharing of results. One important question is whether to test only a small number of 'druggable' genes for mutations (a panel), or to take an unbiased look across the exome (the DNA sequence of the exons of all known genes in the genome), or to sequence the whole genome. Although the latter is most expensive and time-consuming, whole-genome data can be mined for the multitude of alterations that commonly occur in cancer genomes, including structural variants such as chromosomal translocations, large-scale rearrangements and deletions that often fuse exons to create new fusion proteins. Sequencing the exome or a smaller gene list provides a reduced view of the somatic landscape, has limited capability to capture structural variants, and omits regions such as known promoters or other regulatory regions (for example, the TERT gene promoter, a recently discovered mutated driver in melanoma and other tumor types [7,8]). Obviously, the fewer the number of genes studied, the more inexpensive the assay and the more rapid the analysis. In this scenario, however, genes with potential therapeutic indications, such as those in the same biological pathway as a druggable gene that is not itself mutated, will be missed. Hence, a proportion of patients will not obtain a therapeutic match owing to the limited scope of a panel test.
Unfortunately, the complexity of whole-genome analysis, the cost of data generation, and the need for analysis and interpretation to happen in a clinically relevant timeframe have limited the scaling of this most comprehensive approach to high throughput. Our current approach is to sequence the whole genomes of tumor and normal tissue, and to perform exome sequencing of tumor and normal tissue, supplementing the exome capture reagent with additional probe sets that enhance the coverage of frequently mutated genes. This integrated approach improves our ability to detect known variants while not eliminating our ability to discover new variants as well.
A related issue that reflects the difference between PCR-based and massively parallel sequencing (MPS) involves the sequencing of matched normal tissue from the patient. In the existing clinical assay paradigm, only hotspot mutations (such as those discussed earlier in the EGFR tyrosine kinase domain) are sequenced from tumor tissue only. No matching normal tissue is sequenced because these mutations are typically only present in cancer cell genomes, and because the assays are quite focused in their scope, detecting mutations only in restricted, commonly mutated regions. By contrast, the broader scope and ease of obtaining MPS data have changed the focused nature of these assays, resulting in the detection of all variants, somatic and germline, from tumor DNA. Hence, the comparator normal sequence is critically important to define precisely which variants are somatic, but including the normal tissue doubles the cost of sequencing and disrupts the existing clinical assay paradigm. Further complications arise from the broad scope of MPS inquiries because novel mutations in cancer genes are identified (not just the hotspots), often with no clear indication of their pathogenicity (that is, gain or loss of function) or their likely drug response.
Another issue involves a second data type that is proving to be critically informative but is not presently being utilized by most diagnostic testing assays: RNA sequencing of the tumor. RNA sequence data can provide an incredible range of information that identifies genes that are aberrantly overexpressed (often without any indication from DNA sequence data, such as copy number amplification), alleles that are mutated in DNA and potentially druggable, yet are silenced in RNA (again, often without DNA evidence for silencing), and fusion transcripts [9]. In the clinical setting, it is important to detect overexpression of a 'driver' RNA owing to the potential for drug targeting (for example, HER2 in breast cancer), and often the potential for overexpression cannot be detected simply through the examination of DNA (for example, when overexpression is due to altered methylation or histone binding). Furthermore, the support provided by RNA sequence data for analysis of detected DNA variants provides confirmation of potentially druggable mutations. One difficulty in interpreting overexpression lies with the fact that a comprehensive database of RNA expression from RNA sequencing of normal human tissues does not presently exist, although there are efforts to produce such a database (for example, GTEx [10]). When possible, adjacent non-malignant tissue may be sequenced to establish aberrant expression levels in the tumor by comparison. Ultimately, although RNA sequencing represents additional expense and effort, it should enhance our ability to find more answers for more patients [11]. We presently include RNA sequencing of the tumor in our sequencingbased approach to detecting druggable alterations.
Heterogeneity across the cancer cell genomes in each patient's tumor is just beginning to be understood at the level of DNA [12][13][14]. The sensitivity to detect heterogeneous cancer genomes is enhanced by the digital nature of NGS and may eventually provide prognostic information (for example, if more heterogeneous tumors are found to be more likely to acquire therapeutic resistance or to metastasize). Although some studies have predicted that heterogeneity may also imply that therapeutic response in patients with multiple metastatic sites will be unsuccessful [12,15], it remains to be seen whether novel, metastasis-specific mutations are common in all tumor types, and indeed whether they are drivers of metastasis. Furthermore, an ultimate aim of translating cancer genomics to clinical care should be to include sequencing-based diagnostic assays at first diagnosis rather than in the advanced disease setting. In this way, current concerns about the complexity of metastatic presentation could well be obviated if the clinical efficacy of precision cancer therapeutics was successful at primary disease presentation. This practice would also enable the detection of recurrent disease or therapeutic resistance with higher sensitivity than provided by available imaging technologies, using blood-based monitoring, as discussed below.
Educational issues remain and must be addressed if physicians are expected to order, understand and act upon diagnostic results obtained from NGS data. These issues are more readily addressed for physicians in training, and most top-tier medical schools are already offering firstyear students an introduction to genomic medicine concepts, as well as specialized residency training programs in translational genomics. However, the larger number of practicing physicians who also want to use genomicsbased diagnostics but feel alienated because they do not understand the approach and how best to interpret the results of data analysis will need effective continuing medical education (CME)-based training and user-friendly decision support tools to feel sufficiently educated and confident to order and interpret these tests. Furthermore, they will need evidence that specific gene-mutationtargeted therapy combinations will provide clinical benefits, which may be challenging to demonstrate with the current paradigm of one-size-fits-all clinical trials.
A final issue for debate is that no rapid and straightforward mechanism currently exists to communicate the results of what are essentially single-patient studies, often referred to as 'N of 1'. In essence, early efforts that use NGS diagnostics to aid in therapeutic decisionmaking will be experimental and will often involve single patients. Analytical and interpretational pipelines will determine which targeted therapies are most likely to benefit each patient, and in the cases where a patient is treated with a targeted therapy and is monitored to evaluate their response, we need a reporting mechanism that can serve to effectively communicate these results to all groups engaged in this enterprise. This type of reporting must include sharing of those cases for which all indications pointed to a specific therapy yet the patient did not achieve a response. Although the typical reporting of case studies occurs through a peer review process, the need for data sharing in this realm is too urgent to wait for the time required to write and publish a manuscript, and the numbers of cases too large to report each one in this manner. Reporting cases from N of 1 studies in an accessible, community-friendly database format will ensure that each patient's diagnosis and treatment will be as informed as possible. A BioRΧiv-like pre-print server [16] may be worth considering as a vehicle for communication of results. The acquired data from such an exercise would still need to be organized for rapid analysis of new patient data; this is perhaps best funded by the National Institutes of Health (NIH) to ensure free and open access to data while maintaining precautions about data privacy. Those who report cases to the database would have an obligation to follow up on patient outcome data, ensuring maximum benefit. Only as a community willing to share openly will we effectively build the case for the clinical efficacy of genome-guided therapy decisions.

Practical barriers to translation
There are four main areas that are practical barriers to the successful translation of NGS-based diagnoses for cancer. These relate to tissue preparation methods and their impact on nucleic acids, the limited amount and availability of clinical samples for genomic study, and the clinical trials paradigm -as it affects the approval and availability of targeted therapies. Chief among these is the established practice of preserving tissues in formalin and paraffin. Formalin is problematic because it causes nucleic acid degradation by crosslinking to both DNA and RNA backbones, an effect that is worsened by storage time. There are other preservatives besides formalin, such as HOPE or PAXgene, that are commercially available and are less damaging to nucleic acids while maintaining tissue morphology for microscopic assays. However, supplanting the standard pathology preparatory method of formalin fixation and paraffin embedding is unlikely in the near term because it has been the standard in pathology for over 100 years. Perhaps the best driver of this change would be the widespread use of NGS methods in the diagnostic laboratory. For the time being, methods to determine the extent of degradation of DNA or RNA isolated from formalin/paraffin-preserved samples isolated from a tumor of interest are critical precursors to determine whether these nucleic acids are appropriate for NGS.
Often, in addition to the problems associated with formalin fixation and paraffin embedding, the amount of tissue that is available for nucleic acid isolation is limited. After all, the biopsy will also be needed for conventional diagnostic tests, so it needs to be shared across multiple tests for comprehensive diagnosis. The availability of only low amounts of input nucleic acid is common, and methods tailored to low-input library generation prior to NGS are required. In reality, the combination of formalin/paraffin preservation and low input is quite common, so robust protocols must be designed for the combination of these two limiting factors. Of course, the advantage of using NGS methods is that multiple genes can be studied in a single assay, in contrast to PCR-based assays that evaluate one gene hotspot at a time. With PCR, multiple assays are required to test multiple genes in hopes of identifying a therapeutic match, and if insufficient material is available to run multiple assays, a best guess at which genes are most likely mutated is required.
Radiation therapy and some chemotherapeutic agents damage DNA, and all therapies profoundly influence the genomic heterogeneity of any recurrent or metastatic tumor. As a result, for patients who have failed first-or second-line therapy for their tumor type, a new tumor biopsy must be obtained for genomic studies because the genomic profile of the tumor is likely to have been altered by therapeutic intervention. In a few tumor types, the repeat sampling of metastatic or recurrent disease is straightforward, as it is the standard of care and/or is covered by insurance. Far too often, however, repeat sampling is neither the standard of care nor straightforward (for example, for bone or brain metastases), thereby significantly complicating the ability to assay recent tumor samples to identify new therapeutic targets. New approaches to blood-based monitoring of tumor DNA or cells may help to address these issues, as discussed below.
The fourth practical issue is frequently encountered with NGS-based assays. NGS data often predict that a therapy not approved by the US Food and Drug Administration (FDA) or similar governmental bodies for use in that particular tumor type will be effective. This result reflects the growing data from large-scale genomic discovery projects that show that genes that are frequently mutated in cancers are largely not specific for the tissue type of the tumor [17,18]. Although tissue biology remains important for understanding tumor pathology and drug response, the same somatically altered genes can be initiators of oncogenesis in different tissue types. Hence, a new paradigm for clinical trials of antineoplastic drugs has emerged that hopefully will feed forward into the FDA approval process. One such type of trial is the 'basket' trial, in which the gene(s) targeted by a specific drug are tested across patients with multiple disease sites (for instance, lung, colon, breast) and only those patients with a mutational profile suggestive of a therapeutic response are included in the trial [19]. This approach can enable more rapid trial closure or acceleration than trials focusing on a specific disease site, depending upon patient response rates, and could permit FDA consideration of a drug for approval in multiple disease sites [20]. Another trial type is the 'umbrella' or 'master protocol' trial, in which a collection of targeted therapies and corresponding NGS-based molecular profiling companion diagnostic assays are combined to treat either a single cancer type or multiple cancer types [21]. There are numerous examples of basket and umbrella trials being conducted by pharmaceutical companies, the NIH (such as NCI-MATCH), and by hospital systems (for example, Princess Margaret Hospital's IMPACT trial [22]).

Promising new translational approaches
As frequently occurs with emerging technology, innovative approaches that address several of the aforementioned issues are rapidly coming onto the scene. For example, in the case of acquired resistance to therapy, earlier detection is critical to permit the oncologist to devise a new therapeutic regimen before metastatic disease is too advanced. However, metastatic disease is often detected too late by imaging technologies that require a tumor mass to achieve a minimal size before it can be detected. New efforts are pursuing the early detection of circulating tumor cells (CTCs) or circulating DNA or RNA [23], often using NGS to characterize the initial disease presentation, so that mutated genes are identified for that patient's disease and can then be detected by less expensive, more rapid assays using blood samples, taken at strategic time points after surgery to remove the tumor [24,25]. As the genomic alterations that emerge in resistance to targeted therapies are better understood [26][27][28], these also may be monitored by specific blood-based assays. Another example of an application for CTC-or circulating nucleic-acid-based monitoring is in prostate cancer. This approach is currently being used to monitor the progression to androgenresistant disease, which typically forms metastases in bone [29,30]. However, the approach could also be used as a means of informing active surveillance for men with a prostate cancer diagnosis but low risk of developing aggressive disease.
The basket and umbrella clinical trial designs will identify more cases of 'extreme responses' to targeted therapies, defined as those patients whose tumor burden is alleviated rapidly (complete remission) and, in some cases, with long duration [31]. Comprehensive genomic studies of these patients may add genes to the repertoire of diagnostic assays that predict the likelihood of extreme response, and thus better inform therapeutic decision-making. A related consequence of genomic testing for therapy selection is the likely transition of companion diagnostics to sequence-based tests, rather than slower, less sensitive and less specific assays such as fluorescent in situ hybridization (FISH) or cytogenetics. As a result, more patients will be correctly identified as candidates for the drugs that are most likely to help relieve their disease burden.
One interesting application of NGS may amplify the resurgence of immunotherapies, which is currently being driven by the antineoplastic properties of the new checkpoint blockade agents that invoke the immune system's interaction with cancer cells [32][33][34][35][36]. In a new paradigm, NGS identifies tumor-specific mutations and their RNA expression levels, and these data are used to predict immunoepitopes for vaccine development, ignoring the druggable targets and instead prioritizing those tumor-unique peptides that are most immunestimulatory, for use in vaccine development [37,38]. An algorithmic method for prediction of class I human leukocyte antigen (HLA) binding affinity considers the highly expressed somatic missense (altering the amino acid sequence) mutations discovered by NGS analysis, and provides scores for the binding affinity of mutant neoantigens compared with their corresponding wildtype protein sequences [39]. Various in vitro tests using patient-derived T cells can further refine the list of candidates for use in personalized vaccine development, to provide a final list of the most immune-stimulatory peptide sequences that are unique to a patient's tumor. A variety of approaches can be considered for producing a patient-specific vaccine, such as a multivalent DNA vaccine that includes the prioritized peptide-encoding sequences in a vaccine vector, or a dendritic-cell vaccine that uses patient-specific peptides to stimulate the patient's own dendritic cells to induce T-cell memory [37]. Furthermore, the astounding sequencing capacity of NGS instruments now makes monitoring changes in a patient's T-and B-cell repertoires quite straightforward within a single experiment [40], and could provide a critically important component of patient monitoring during immunotherapy or other therapeutic interventions.

Conclusions and future directions
Multiple technological innovations fueled by the application of NGS technology to cancer genomics are now making the clinical use of NGS platforms and assays increasingly common. Applications for therapeutic decision-making, including for new clinical trials of targeted therapies, are increasing the demand for NGS assays and improving cancer care and patient monitoring. Over coming years, this virtuous circle will provide important metrics that establish the sensitivity and specificity of NGS-based diagnostic assays in comparison to current clinical standards (for example, FISH or immunohistochemistry). These efforts will also consolidate the use of NGS-based diagnostics in cancer medicine by virtue of clinical efficacy, as the precision afforded by these tests results in patient relief from disease burden by prescribing the right drug for each patient's disease. The associated disruption caused by these diagnostic changes will affect current pathologybased diagnoses, challenge standards of care for most cancer types, and invoke the many other issues discussed here. Ultimately, this disruption and the resolution of these issues will benefit patients by ensuring more precise diagnostic and therapeutic results in their fight against cancer.