Skip to main content

Table 2 Composition of the datasets used as proxies to compare the performance of transformed and original scores at assessing the functional impact of cancer somatic mutations

From: Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation

Name Source Positives Negatives N positives N negatives
COSMIC2+/1 COSMIC Mutations that appear in 2 or more samples Mutations that appear in 1 sample 4,012 39,854
COSMIC5+/1 COSMIC Mutations that appear in 5 or more samples Mutations that appear in 1 sample 1,480 39,854
COSMIC2+/Pol COSMIC/ HumVar [2] Mutations that appear in 2 or more samples Known polymorphisms 4,012 8,257
COSMIC5+/Pol COSMIC/ HumVar Mutations that appear in 5 or more samples Known polymorphisms 1,480 8,257
COSMICD/O COSMIC COSMIC mutations included in the manually curated list of drivers used to train CHASM [5] COSMIC mutations without the positive subset 2,185 41,681
COSMICD/Pol COSMIC/ HumVar Mutations included in the manually curated list of drivers used to train CHASM Known polymorphisms 2,185 8,257
COSMICCGC/ nonCGC COSMIC COSMIC mutations in genes included in the Cancer Gene Census [13] Non-recurrent COSMIC mutations in genes not included in the Cancer Gene Census 4,685 35,907
WG2+/1 Pooled cancer somatic mutations Mutations that appear in 2 or more samples Mutations that appear in 1 sample 1,031 26,025
WGCGC/ nonCGC Pooled cancer somatic mutations Mutations in genes included in the Cancer Gene Census [13] Non-recurrent mutations in genes not included in the Cancer Gene Census 1,412 24,837
  1. HumVar is a dataset of disease-related SNVs and neutral polymorphisms [2]. WG (whole genome) is a dataset of somatic mutations pooled from different tumor exome-sequencing projects (Table 1).