Skip to main content

Table 2 Methods for prediction of driver mutations and genes

From: Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

Objective

Data

Method

Description

Recurrent somatic mutation identification

SNV

MutSigCV[48]

Uses coverage information and genomic features (e.g. DNA replication time) to estimate the background mutation rate of a gene.

MuSiC[49]

Uses a per-gene background mutation rate; allows for user-defined regions of interest.

Youn et al.[51]

Includes predicted impact on protein function in determining recurrent mutations.

Sjöblom et al.[52]

Defines a cancer mutation prevalence score for each gene.

DrGaP[139]

Uses Bayesian approach to estimate background mutation rate; helpful for cancer types with low mutation rate.

CNA

GISTIC2[61], JISTIC[63]

Uses ‘peel-off’ techniques to find smaller recurrent aberrations inside larger aberrations.

CMDS[62]

Identifies recurrent CNAs from unsegmented data.

ADMIRE[65]

Multi-scale smoothing of copy number profiles.

Functional impact prediction

General

SIFT[72]

Uses conservation of amino acids to predict functional impact of a non-synonymous amino-acid change.

Polyphen-2[74]

Infers functional impact of non-synonymous amino-acid changes through alignments of related peptide sequences and a machine-learning-based probabilistic classifier.

MutationAssessor[75]

Uses protein homologs to calculate a score based on the divergence in conservation caused by an amino-acid change.

PROVEAN[73]

Benchmarks favorably against MutationAssessor, Polyphen-2 and SIFT.

Cancer-specific

CHASM[77]

Uses a machine-learning approach to classify mutations as drivers or passengers based on sequence conservation, protein domains, and protein structure.

Oncodrive-FM[79]

Combines scores from SIFT, Polyphen-2, and MutationAccessor into a single ranking.

Positional or structural clustering

NMC[83]

Finds clusters of non-synonymous mutations across patients. Typically used with missense mutations to detect so-called ‘activating’ mutations.

iPAC[84]

Extends the NMC approach to search for clusters of mutations in three-dimensional space using crystal structures of proteins.

Pathway analysis and combinations of mutations

Known pathways

GSEA[92]

A general technique for testing ranked lists of genes for enrichment in known gene sets. Can be used on rankings derived from significance of observed mutations.

PathScan[95]

Finds pathways with excess of mutations in a gene set (pathway), by combining P-values of enrichment across samples.

Patient-oriented gene sets[94]

Tests known pathways using a binary indicator for a pathway in each patient.

Interaction networks

NetBox[140]

Finds network modules in a user-provided list of genes. Significance depends only on the topology of the genes in the network, and not on mutation scores.

HotNet[102]

Finds subnetworks with significantly more aberrations than would be expected by chance, using both network topology and user-defined gene or protein scores.

MEMo[104]

Finds subnetworks whose interacting pairs of genes have mutually exclusive aberrations[105]; recommends including only recurrent SNVs and CNAs in the analysis.

De novo

Dendrix[102]

Identifies groups of genes with mutually exclusive aberrations.

Multi-Dendrix[112]

Simultaneously finds multiple groups of genes with mutually exclusive aberrations.

  

RME[110]

Finds groups of genes with mutually exclusive aberrations by building from gene pairs; best results obtained when restricting to genes with high mutation frequencies (e.g. > 10%).

  1. CNA, copy number aberration; SNV, single-nucleotide variant.
  2. A representative list of software available to predict driver mutations or genes by detecting their recurrence across multiple samples, functional impact, or interactions with other mutations in pathways or combinations. Some methods fall into multiple categories but are listed only once for clarity.