Skip to main content
Fig. 3 | Genome Medicine

Fig. 3

From: Pan-cancer detection of driver genes at the single-patient resolution

Fig. 3

Effect of cohort size on sysSVM2 performance. a Distributions of AUCs comparing the ranks of canonical drivers to the rest of genes (green) and false positives (brown). Models were trained on ten simulated cohorts composed of ten, 100, 200 and 1000, for a total of 40 simulated cohorts. These were then used to predict on the same reference cohort of 1000 samples. The AUC was measured for each set of predictions in each sample. b Distributions of composition scores of the top five predictions in terms of canonical drivers, candidate cancer genes, false positives and rest of genes (Additional file 1: Supplementary Methods). The composition score was measured for each set of predictions in each sample. Six training cohorts of size ten and two cohorts of size 100 gave negative composition scores in at least one sample, indicating highly ranked false positive genes. c Ratios between observed and expected numbers of canonical drivers and false positives in the top five predictions (O/E ratios). For each size of the training cohort, the percentages of samples with a false positive O/E ratio of zero and canonical driver O/E ratios greater that 2, 5 and 10 are shown (Additional file 1: Supplementary Methods). d Rank-biased overlap (RBO) score of the top five predictions in each sample (Additional file 1: Supplementary Methods). RBO scores measured the similarity between the predictions from every possible pair of models trained on cohorts of a particular sample size. Statistical significance was measured using two-sided Wilcoxon tests. ****p < 2.2 × 10−16. e Distribution of the number of top five predictions shared between models trained with the same cohort size. The overlap was calculated between each pair of predictions in each sample

Back to article page