Rational design of cancer gene panels with OncoPaD

Background Profiling the somatic mutations of genes which may inform about tumor evolution, prognostics and treatment is becoming a standard tool in clinical oncology. Commercially available cancer gene panels rely on manually gathered cancer-related genes, in a “one-size-fits-many” solution. The design of new panels requires laborious search of literature and cancer genomics resources, with their performance on cohorts of patients difficult to estimate. Results We present OncoPaD, to our knowledge the first tool aimed at the rational design of cancer gene panels. OncoPaD estimates the cost-effectiveness of the designed panel on a cohort of tumors and provides reports on the importance of individual mutations for tumorigenesis or therapy. With a friendly interface and intuitive input, OncoPaD suggests researchers relevant sets of genes to be included in the panel, because prior knowledge or analyses indicate that their mutations either drive tumorigenesis or function as biomarkers of drug response. OncoPaD also provides reports on the importance of individual mutations for tumorigenesis or therapy that support the interpretation of the results obtained with the designed panel. We demonstrate in silico that OncoPaD designed panels are more cost-effective—i.e. detect a maximum fraction of tumors in the cohort by sequencing a minimum quantity of DNA—than available panels. Conclusions With its unique features, OncoPaD will help clinicians and researchers design tailored next-generating sequencing (NGS) panels to detect circulating tumor DNA or biopsy specimens, thereby facilitating early and accurate detection of tumors, genomics informed therapeutic decisions, patient follow-up and timely identification of resistance mechanisms to targeted agents. OncoPaD may be accessed through http://www.intogen.org/oncopad. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0349-1) contains supplementary material, which is available to authorized users.

.    (B) Table of the cost-effectiveness of panels. Columns detailed explanation can be found in Table   S1.

Integrating lists of known cancer driver genes
We considered as input candidates for panel design the cancer driver genes identified by: l The Cancer Gene Census. We only included genes identified through mutational evidence in any of the 28 tumor types of the pan-cancer cohort studied above. The venn diagram on the right represent the overlap between the three first lists of cancer driver genes of all cancer types.

Prioritization of panel candidates
OncoPaD computes the cumulative mutational frequency (CMF) of the panel in the cohort of the tumor type(s) selected by the user as the number of tumors bearing protein affecting mutations (PAMs) in each gene (or hotspot) but with no mutations in previously considered elements. As protein affecting mutations we considered those with the following consequence types: stop gain or loss, missense and frameshift indels. Note that splice donor and acceptor consequence types are protein affecting mutations but they were excluded due to their location in non-exonic regions.  If there are more than 5 Tier 1 candidates, they can be fine tuned by being more restrictive in the inclusion of genes in Tier 1, named Tier 1 stringent classification. This starts from the aforementioned classification of genes in tiers and applies the same rationale of gene prioritization through intersection of the cumulative distribution with its linear fit but based only on Tier 1 cumulative distribution. Thus, amongst Tier 1 genes it prioritizes the ones increasing more the mutational coverage, the genes between the beginning of the distribution and last intersection of the Tier 1 genes cumulative distribution, the genes after it are re-allocated as Tier 2 genes.

Resources used to annotate mutations and genes in the panel
We have retrieved information from the following sources: resistance. The information on each biomarker includes the cancer type where the drug -biomarker association has been found, along with the level of evidence of the association --i.e. whether it has been found in a clinical trial, a pre-clinical assay or reported from sporadic clinical cases. OncoPaD only reports information on mutational biomarkers. OncoPaD hotspots were mapped from genomic coordinates onto protein coordinates using CAVA(4) to associate the drug biomarkers.
At the gene level OncoPaD adds information regarding the role of the gene in cancer (a prediction on whether it acts through loss of function or activation). These predictions were generated for Cancer Drivers Database driver genes using OncodriveROLE(5), a random forest classifier-based tool trained with genomic data from pan-cancer TCGA cohort. Cancer driver genes from CGC were annotated as Activating if they were classified as Dominant and as Loss of function if they were classified as Recessive, ambiguous genes were classified as No class. OncoPaD also adds annotation on the tendency of a gene of being clonal, in other words, being mutated in the major clones of tumors from a certain cancer type. Major clones per cancer type were identified through pan-cancer TCGA cohort data on Variant Allele Frequency. We retrieved this information from Cancer Drivers Database too.