Skip to main content
Fig. 2 | Genome Medicine

Fig. 2

From: Pan-cancer detection of driver genes at the single-patient resolution

Fig. 2

sysSVM optimisation on the simulated reference cohort. Model performances on the reference cohort using centred (left) and un-centred (right) data with all 25 systems-level features (a) or excluding protein length and number of protein domains (b). A sparse grid of 512 parameter combinations in the four kernels was tested. The performance of each model was measured using the area under the curve (AUC), comparing the ranks of canonical drivers to the rest of genes and false positives. Median AUC values across all samples were plotted. Red dotted lines represent the minimum AUC values. Correlation between model average sensitivity and AUCs of canonical drivers over the rest of genes (c) or false positives (d). The sensitivity of each kernel was measured on the training set over 100 three-fold cross-validation iterations. The median values over the four kernels are plotted. R and p values from Pearson’s correlation test are reported. Dotted red lines indicate the linear regression curves of best fit. e Distributions of sysSVM2 prediction scores for different types of damaged genes in the reference cohort. Whiskers extend to 1.5 times the interquartile range (IQR). Statistical significance was measured using two-sided Wilcoxon tests. The median values of the distributions are labelled. ****p < 2.2 × 10−16. f Receiver operating characteristic (ROC) curves, comparing canonical drivers to the rest of genes (green) and to false positives (brown). Recall rates were calculated for each sample separately and the median ROC curve across samples was plotted. Median areas under the curve (AUCs) for both comparisons are also indicated

Back to article page