Database of Evidence for Precision Oncology
We utilized a repository of known variant/drug interactions, which we refer to as “Database of Evidence for Precision Oncology” or DEPO (Sun et al.[26], in revision), containing data from publically available datasets and papers [20, 23,24,25] (Fig. 1a).
In aggregate, 609 unique variants with known drug interactions currently reside in DEPO, and account for a total of ~ 800 unique variant/drug interactions (Fig. 1b). Approximately 70% of known variant/drug interactions result in increased sensitivity to therapy. Further, a substantial number (~ 25%) of sensitive variant/drug interactions are approved by the FDA for a particular cancer type or are based on late-stage clinical studies. Several genes account for a large proportion of variant/drug interactions (e.g., EGFR, KIT, ERBB2, BRCA1, PDGFRA), reflecting interest in therapeutically exploiting a relatively limited number of cancer driver genes [5] (Fig. 1c). Altogether, 168 genes are represented in the current version of DEPO.
Drug-associated mutations in pan-cancer cohort
We leveraged the genomic sequence data of 6570 tumor samples from TCGA representing 22 adult cancer types (Synapse ID, syn12618789). Mutations associated with drug sensitivity in DEPO were matched against the TCGA cohort. Our analysis reveals 2364 mutations across 2114 tumors that are associated with sensitivity to one or more drugs (mean = 1.12/tumor) (Additional file 2: Table S2). Three hundred sixty-two distinct mutations are represented across 40 genes. The low fraction of drug-associated mutations likely reflects the large number of passengers in cancer [42, 43]. Thirty-two percent of tumors had at least one drug-associated mutation, a percentage that is consistent with the 28% of screened patients that could be matched with a targeted therapy or trial [44].
Initially, we analyzed the percentage of potentially druggable tumors in a cancer-type-specific setting (Fig. 2), that is, tumors with mutations associated with a known drug response in the cancer type with the highest level of evidence. Only 3.3% of the samples contain a druggable mutation known to be FDA approved; however, if we consider less mature evidence: clinical trials, preclinical, and case reports, we could potentially increase the percentage of tumors with drug-associated mutations to 8.2, 8.5, and 10.5%, respectively. Here, skin cutaneous melanoma (SKCM) is the cancer type with the largest fraction of drug-associated mutations (78%). SKCM with a BRAF V600E/K mutation (40% of patients) can be treated with BRAF and MEK inhibitors based on FDA approval. The NRAS Q61 mutations found in 12% of SKCM patients are more challenging to treat, as is any RAS-mutant cancer due to activation of multiple signaling pathways. Early generation MEK-exclusive inhibition proved to be ineffective, with multiple failed clinical trials prompting exploration of newer generation MEK inhibitors and MEK inhibitor combinations with downstream targets of NRAS [45]. In colon and rectal carcinoma (COADREAD), glioblastoma multiforme (GBM), and lung adenocarcinoma (LUAD), 21, 14, and 40% of their respective tumors contain a drug-associated mutation in a cancer-type-specific setting. In COADREAD, drug-associated variants PIK3CA E542K, E545K, and H1047R are present in 2.1, 5.2, and 1.8% of tumors, respectively, and are associated with sensitivity to PI3K/AKT/mTOR pathway inhibitors in early-stage trials [46] and aspirin in observational studies [47, 48]. PIK3CA-mutant cancers are also an ongoing challenge to treat clinically; co-occurring drugs targeting the PI3K pathway have been more effective than single-agent PI3K inhibition in treating PIK3CA-mutant cancers, but efficacy varies with mutation profile [46]. In GBM, the EGFR extracellular mutations (A289V, G598V, and R108K) and IDH1 mutation R132H are present in 10 and 4.5% of tumors, respectively, and are associated with drug response based on preclinical data [49]. In non-small cell lung cancer, EGFR inhibitors (e.g., erlotinib) are FDA approved for tumors with activating EGFR mutations, which are present at 10 and 1% in our LUAD and lung squamous cell carcinoma (LUSC) cohorts, respectively.
Despite the promise of targeted therapy, only 10.5% of this pan-cancer cohort contains potential drug-associated mutations in a cancer-type-specific setting. With drug repurposing across cancer types, in which a drug used primarily in cancer type A with mutation X is repurposed for cancer type B with mutation X, we find that an additional 5.4% of patients may be treated with a FDA-approved drug-variant interaction (Figs. 2 and 3, Additional file 2: Table S12). This number can be increased to 22.8% if we consider repurposing of lower tier drug-variant pairs to other cancer types; however, these interactions will require clinical validation to be considered truly druggable. In this cancer-type-non-specific setting, cancer types in which at least 40% of tumors have drug-associated mutations include low-grade glioma (LGG, 76%), thyroid carcinoma (THCA, 70%), and colorectal adenocarcinoma (COADREAD, 42%). A small number of drug-associated mutations occur at high frequency in these cancer types. For example, in THCA, the BRAF V600E variant is found in 60% of tumors. Clinical trials have investigated the use of BRAF inhibitors combined with MEK inhibitors in THCA. However, BRAF V600E also occurs at a lower frequency in HNSC, KIRP, LGG, and GBM, indicating significant repurposing potential for BRAF inhibitors [50, 51] (Fig. 3).
COADREAD may also have potential for therapeutic intervention via repurposing (Fig. 2a). However, COADREAD has been difficult to treat due to a large presence of KRAS and BRAF mutations; EGFR inhibition as monotherapy is used for COADREAD, but only in tumors with wild-type KRAS [52, 53]. Repurposing drugs that inhibit downstream effectors of KRAS (e.g., MEK) is an alternative therapeutic strategy for KRAS-mutant COADREAD (23.8% of patients). The efficacy of MEK inhibition in combination with sorafenib has been tested in clinical trials for KRAS- or NRAS-mutant liver hepatocellular carcinoma (LIHC) [54] and has shown positive results. Co-targeting of MEK and AKT signaling showed some durable response in a phase I study [55], and most recently, a small trial showed some success combining an investigational MEK inhibitor with a CDK4/6 inhibitor in non-small cell lung cancer (NSCLC) (trial NCT number NCT02022982). COADREAD or other cancer types having RAS mutations, such as cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), acute myeloid leukemia (AML), stomach adenocarcinoma (STAD), and uterine corpus endometrial carcinoma (UCEC), could benefit from further exploration of combinatorial therapies targeting downstream targets of KRAS (Fig. 2b). BRAF-mutant COADREAD (7.6% of patients) presents a similar problem in that BRAF inhibitor monotherapy is ineffective unlike in BRAF-mutant melanoma and that triple drug combination targeting the EGFR, MAPK, and PI3K pathway has shown more positive results. Numerous clinical trials are underway to find the best combination therapies with BRAF inhibitors, including new drugs that are Wnt pathway and cyclin-dependent kinase inhibitors [56]. Together, cancer-type-specific and non-specific mutational analyses identified potential therapeutic targets in 2114 tumors (32%), some of which will be considered druggable only with further clinical development and FDA approval.
Protein structure-based clustering of drug-associated mutations
We applied a structure-based clustering tool, HotSpot3D [27], to the pan-cancer dataset to reveal putative functional mutations (Additional file 2: Table S13). HotSpot3D’s utility in predicting functional mutations is supported by experimental evidence using cell lines expressing one of several EGFR-mutant proteins [36]. HotSpot3D identifies mutations that, by clustering in protein space with mutations from DEPO associated with drug sensitivity or resistance, may themselves affect drug binding affinity and response. Out of 160 “sensitive” mutations from DEPO that mapped onto protein structures, we identified 134 “sensitive” mutations in HotSpot3D clusters, which in turn were clustered with 214 putative sensitive mutations that were not catalogued in DEPO. These mutations were found in 55 clusters from 24 genes (Fig. 4a). Among all genes in our analysis, EGFR contains the highest number of putative sensitive mutations, with 36 mutations that clustered with 19 mutations in DEPO from seven different clusters (Fig. 4a). This clustering analysis helps winnow down the mutation list to candidates likely to affect drug response and provides context for further experimental testing, but does not necessarily indicate the direction of drug response; in total, HotSpot3D analysis identified potential therapeutic targets in 458 tumors (7%).
We identified putative resistant mutations as those that clustered with “resistant” mutations from DEPO; further, to prevent contradictory annotation of putative mutations as both “sensitive” and “resistant,” we limited our analysis of clusters containing “resistant” mutations to those that did not overlap with clusters containing sensitive mutations. This procedure yielded four different clusters with a “resistant” mutation in AKT1, MAP2K1, and RAC1; these four clusters contained 14 putative resistant mutations clustering with four known resistant mutations (Additional file 2: Table S13). RAC1 yielded the largest cluster, with RAC1 P29S mediating resistance to BRAF inhibitors in BRAF-mutant SKCM [57]. Other mutations in this cluster that may affect binding affinity of BRAF inhibitors (or that may mediate resistance to BRAF inhibitors) are C18Y, E31D, A159V, P29L/T, and P34S.
To provide evidence in support of mutation clustering as a method for identifying putative druggable mutations, we first show that known drug-associated mutations in DEPO that affect binding affinity of drugs in the same drug class cluster spatially. Most clusters contain more than one known drug-associated mutation. For example, KIT has multiple clusters with known mutations; one of which has three known mutations (E490D, Y494C, S476G) in the same cluster, which are FDA approved as sensitive to combined therapy of imatinib, sunitinib, and regorafenib (KIT and angiogenesis inhibitor). In addition, this cluster contains two other unique mutations (D439H, I438L) not in DEPO that, based on our analysis using HotSpot3D, could also affect binding affinity and potentially tumor sensitivity to KIT combined with angiogenesis inhibitors (Additional file 2: Table S13). Second, we experimentally validated HotSpot3D as a tool for identifying functional mutations associated with drug response. To do this, we assessed the activity and drug sensitivity of a set of six BRAF mutations (F635I, G596D, K601E, W604L, L613F, G596R) in close spatial proximity to the well-studied V600E pathogenic mutation (Fig. 4b). A key function of BRAF is phosphorylating MEK1/2. Therefore, we transfected BRAF mutations, along with wild-type BRAF and BRAF V600E, into HEK293T cells in the presence or absence of BRAF inhibitor dabrafenib, and used phosphorylation changes in MEK1/2 as an indicator of BRAF activity. The undetectable level of endogenous BRAF in HEK293T cells eliminates potential ambiguity in interpreting the effects of transfected BRAF mutations. As expected, BRAF V600E caused drastically increased phosphorylation in MEK1/2 that is reduced by dabrafenib (Fig. 4c). Three (G596D, K601E, and W604L) out of six other transfected BRAF mutations also showed higher levels of MEK1/2 phosphorylation and sensitivity to dabrafenib than wild-type BRAF, suggesting that a high percentage of mutations identified by Hotspot3D in close spatial proximity to V600E are activated and similarly sensitive to dabrafenib. Notably, BRAF G596R-transfected cells appeared to have a much lower level of MEK1/2 phosphorylation when compared to those transfected with wild-type BRAF, supporting prior findings that G596R results in BRAF loss of function [58]. Our ongoing development of comprehensive computational tools combining spatial proximity with considerations of specific amino acid substitutions and other structural features will further improve the accuracy of identifying functional mutations. Overall, HotSpot3D, combined with experimental assays, can help identify functional mutations that are candidates for inclusion in DEPO and worth further clinical exploration.
Druggable gene and protein expression outliers in pan-cancer cohort
In addition to driver mutations in oncogenes, elevated expression of genes or gene products can also be used to select tumors for targeted therapy [59,60,61]. For example, in the case of breast cancer, elevated mRNA expression and copy number amplification of ESR1 correlate with elevated protein expression of ER [62, 63], as well as with sensitivity to hormonal therapy with tamoxifen [62, 64]. In general, tumors with elevated protein expression may respond to drugs that activate antibody-dependent cell-mediated cytotoxicity [65], suppress signaling pathways essential for tumor survival [66], or deliver cytotoxic agents via tumor-specific antigens [67].
Therefore, to further expand the set of tumors with potential drug-associated biomarkers, we sought transcriptomic and proteomic evidence of elevated gene/protein expression. For each gene in DEPO whose expression is associated with drug response, tumors with outliers were identified using the pan-cancer cohort as a reference. We defined outliers as expression values exceeding 1.5 interquartile ranges (IQR) above the third quartile of the cohort [68]. We applied this outlier detection strategy across mRNA, protein, and protein phosphorylation levels. RNA-seq and protein RPPA data are available for 5286 and 3877 tumors out of 6570 tumors in the TCGA cohort, respectively (Additional file 2: Table S14). DEPO has 50 genes whose expression is associated with drug response, 39 of which are associated with drug sensitivity. We identified elevated expression of druggable genes with drug sensitivity in 16 and 30% of the pan-cancer cohort of 6570 TCGA tumors at the mRNA and protein/phosphoprotein levels, respectively (Fig. 5). Interestingly, tumors with “druggable” gene fusions tend to express elevated levels of the corresponding druggable gene (Additional file 2: Table S15, Additional file 3: Figure S1) [69], suggesting that fusions may be one of several drivers of gene and protein expression.
To determine mRNA expression outliers in tumor samples, we used RNA-seq data from TCGA (Fig. 5a). Elevated DLL3 expression was identified in 161 tumors, including LGG, GBM, and SKCM tumors. DLL3 contributes to neuroendocrine tumorigenesis by inhibiting the Notch signaling pathway, whose role is to suppress tumor growth. A DLL3-targeted antibody-drug conjugate in phase II clinical trials effectively targets DLL3-expressing cells in high-grade pulmonary neuroendocrine tumors [70, 71]. This same therapy could potentially benefit GBM, LGG, and SKCM via repurposing due to shared levels of high DLL3 expression. Seventeen percent of BRCA and UCEC express PGR and 9.4% of BRCA express ERBB2 in our cohort, reflecting the FDA-approved use of anti-estrogen hormone therapy and HER-2 inhibitors, respectively, in these cancer types. ERBB2 is expressed in other cancer types, such as BLCA and CESC, which could benefit from repurposing and further exploration of HER2-inhibition; HER-2 inhibitors for COADREAD are currently being explored in late-stage clinical trials.
To examine tumors with potential drug-associated biomarkers based on protein expression and phosphosite levels, we used TCGA reverse phase protein array (RPPA) data (Fig. 5b). Compared to the pan-cancer cohort, 83% of prostate adenocarcinoma (PRAD) express elevated AR, reflecting their tissue of origin. Elevated AR is also present in 9% of breast adenocarcinoma (BRCA). These 9% of BRCA express higher levels of AR than 17% of PRAD, suggesting that androgen-deprivation therapy can potentially be repurposed for AR-positive BRCA [72] (Additional file 2: Table S16). Similarly, 26 and 52% of BRCA and UCEC, respectively, show elevated activity at ESR1’s p.S118 phosphosite. These only represent a fraction of druggable BRCA, as 77% of tumors in a large breast cancer registry are ER positive [73]. Elevated expression and activity of EGFR protein and its phosphosites across cancer types suggest that phosphoproteome analysis may inform treatment response. EGFR phosphosites p.Y1068 and p.Y1173 are active in GBM, head and neck squamous cell carcinoma (HNSC), KIRC, LUAD, and LUSC. Some evidence has shown that HNSC, LUAD, and LUSC are responsive to EGFR tyrosine kinase inhibitors (TKIs) [74, 75], perhaps because EGFR TKIs inhibit autophosphorylation rather than elevated protein expression [76]. In KIRC, EGFR inhibitors have negligible activity [77,78,79] despite active phosphosites in our analysis, possibly because EGFR is one of many growth factors expressed in KIRC or because EGFR inhibition is ineffective in the absence of functioning VHL [80].
Altogether, our results suggest that protein outlier analysis may require integration with mutational and/or mRNA expression analyses to better predict response to therapy. Additionally, mass spectrometry for protein expression can be valuable in validating RNA-seq and RPPA data as well as capturing new putative druggable events (Additional file 1, Additional file 3: Figure S2). mRNA and phosphoprotein expression outlier analysis identified potential therapeutic targets in 2559 tumors (39%).
Integrative omics analysis of druggability
Assessing alterations in multiple levels of data across genes may improve predictions of druggability. For example, with trastuzumab, a single testing method or biomarker (CNV, mRNA expression, protein expression, etc.) can be insufficient for stratifying patients into responders and non-responders [59]. Therefore, we assessed druggability using comprehensive mutational, RNA-seq, and RPPA data in 3121 tumors. Of these, 1003 tumors (32%) are potentially druggable based on two or more data types (genomic, transcriptomic, proteomic) (Fig. 6a, Additional file 2: Table S8), affording an opportunity for clinical or mechanistic analyses connecting drug-associated mutations with transcriptomic/proteomic expression events. Figure 6b and Additional file 2: Table S10 depict tumors with multiple levels of alterations associated with sensitivity to one of ten categories of FDA-approved cancer drugs (Additional file 2: Table S9). Seventy-two tumors had elevated mRNA and protein expression of HER2; these may be expected to have greater or more uniform sensitivity to HER2 inhibition than tumors with elevated mRNA or protein expression alone. Identifying mutations associated with drug resistance may further improve predictions of druggability. RAC1 P29S co-occurs with mutations in BRAF and MEK1 in four SKCM tumors (Additional file 2: Table S17, Additional file 3: Figure S3). RAC1 P29S renders SKCM resistant to BRAF/MEK inhibition [57]; testing for RAC1 P29S may identify patients with BRAF V600E SKCM unlikely to benefit from BRAF/MEK inhibitor. In this case, the single-gene paradigm of existing companion diagnostics may be insufficient to determine best treatment options; rather, comprehensive mutational profiling should be considered.
Multi-omics profiling also reveals opportunities for combinatorial therapy. AKT1 E17K co-occurs with BRAF V600E in five tumors (Additional file 2: Table S17, Additional file 3: Figure S3). Combining an AKT inhibitor with the current standard of treatment for BRAF V600E-positive SKCM (BRAF/MEK co-inhibition) may delay drug resistance [81]. Transcriptomic and proteomic expression profiling reveals 48 additional tumors with BRAF V600E/K and elevated AKT (AKT1/2/3) expression at the mRNA or protein/phosphoprotein levels; these may also benefit from BRAF/AKT inhibition (Fig. 6b, Additional file 2: Table S10). Similarly, Fig. 6b shows that 38 tumors contain biomarkers of response (i.e., mutational or expression based) for both EGFR and CDK inhibitors. Though both therapies are FDA approved, no clinical trials to date have examined combinatorial therapy with EGFR and CDK dual inhibition. Additionally, 105 tumors contain activating PIK3CA mutations co-occurring with elevated mRNA or protein expression of ESR1 or PGR. Given the success of mTOR and anti-estrogen therapy in ER-positive breast cancer [82], this combination may be useful in other cancer types that are dependent on hormonal or PI3K/mTOR signaling. By identifying tumors with biomarkers of response to multiple drugs, and by identifying variations in biomarkers across gender and ethnicity (Additional file 1, Additional file 2: Table S11, Additional file 3: Figure S4), multi-omics profiling can facilitate the rational design of clinical trials for combinatorial therapy.
Validation of druggability analyses with large-scale drug screening
We sought to provide support for our two hypotheses that our approaches relied upon: (1) a drug with evidence supporting use in a given cancer type can be repurposed to other cancer types that contain a shared genetic alteration; (2) gene/protein expression outlier score is a predictor of drug sensitivity. To test these hypotheses, we utilized the Genomics of Drug Sensitivity in Cancer (GDSC) database, which contains drug sensitivity data for around 75,000 experiments of 138 anticancer drugs (Additional file 2: Table S4) across 700 cancer cell lines [83]. We extracted tissue type, the mutational landscape (missense mutations and in-frame indels), gene expression, and drug sensitivity information for each cell line.
Twenty-six sensitive mutations from DEPO are found in GDSC cell lines paired with 44 drugs (Additional file 2: Table S5). BRAF V600E, PIK3CA H1047R, and KRAS G12D occur most frequently in GDSC cell lines. Overall, the mean LN(IC50) for cell lines that contain a sensitive mutation from DEPO was significantly lower than background LN(IC50) in both the cancer-type-specific and non-specific setting (Mann-Whitney U test, P = 1.1e−96 and P = 1.3e−109, respectively) (Fig. 7a). Individual variant/drug combinations from DEPO also performed well; 39 variant/drug combinations in the cell line data occurred in sufficient samples in both the cancer-type-specific and non-specific settings for statistical analysis (Additional file 2: Table S6). This represented 6 of 26 sensitive mutations. In both the cancer-type-specific and non-specific settings, 19 variant/drug combinations had significantly lower mean LN(IC50) than background LN(IC50) for the corresponding drug. Based on these 19 drug-variant combinations, 4 out of 6 sensitive mutations in DEPO (KRAS G12V, BRAF V600E, NRAS Q61K, and KRAS G12D) were significantly associated with sensitivity to at least one of their paired drugs in both the cancer-type-specific and non-specific settings. For example, cell lines with BRAF V600E were associated with sensitivity to BRAF inhibitors PLX4720 (1), PLX4720 (2), and dabrafenib in both the cancer-type-specific (SKCM) and non-specific settings (BRCA, COADREAD, GBM, LGG, LIHC, and THCA) (Fig. 7b). Two out of six mutations (PIK3CA H1047R and KRAS G12C) was associated with sensitivity in either the cancer-type-specific or the non-specific setting. Cell lines with PIK3CA H1047R had a significantly lower mean LN(IC50) in the cancer-type-non-specific setting; however, this category encompassed several cancer types, including BRCA, HNSC, and ovarian serous carcinoma (OV). Similarly, cell lines with KRAS G12C had a significant lower mean LN(IC50) in the cancer-type-specific setting, encompassing LIHC, LUAD, LUSC, and pancreatic adenocarcinoma (PAAD). Overall, our analyses provide some evidence to support our hypothesis that drugs can potentially be repurposed across several cancer types using shared mutational biomarkers of druggability. It must be noted, however, that sensitivity to drug response in cell lines does not necessarily translate over to clinical efficacy, and RAS- and PIK3CA-mutant cancers continue to be controversial.
To verify that gene expression outlier score was correlated with drug response, we conducted linear regression analysis for gene probe/drug combinations (Additional file 2: Table S18) using 116 different probes for 22 genes in DEPO. Forty-two probe/drug combinations corresponding to 10 genes had significant negative correlation (P < 0.05) between LN(IC50) and gene expression outlier score (Fig. 7c, Additional file 2: Table S7). For example, MDM2 expression correlates with sensitivity to nutlin-3a and EGFR expression correlates with sensitivity to erlotinib, lapatinib, and gefitinib (Fig. 7d, e). Similar trends are observed in CDK6 with palbociclib (PD-0332991: CDK4/6 inhibitor) and ERBB2 with lapatinib (Additional file 2: Table S7). Though cell line-based validation does not guarantee 100% drug response in patients, our analysis demonstrates that expression in 10 of 22 genes correlates with drug sensitivity in GDSC. Expression in other genes such as AKT2 and KIT did not correlate with drug sensitivity (Additional file 2: Table S7). However, this does not rule out the clinical utility of expression assays for these genes given that, for instance, KIT protein expression is an FDA-approved companion diagnostic for imatinib use. Overall, our analysis suggests that using gene expression outliers is a reasonable approach for predicting druggability in human cancers; however, some of these interactions still need to be validated in a clinical setting.