Skip to main content

Table 1 Characteristics of cancer datasets used for training and/or validation

From: ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

Dataset (source) Number of samples Number of samples used in testing Mean read depth after filtering [95% CI] Mutation calling pipeline Total number of somatic/germline SNVsa in all samples Mean somatic SNVs per sample [95% CI]a Mean germline per sample [95% CI]a Ratio somatic to germline (after collapsing)
UCEC (TGCA) 251 151 88.84 [88.77, 88.92] bambam_v1.4 38,012/504,241 147.015 [88.36, 205.66] 2,008.92 [1,972.23, 2,045.62] 2:1
BRCA (TCGA) 500 400 85.92 [85.87, 85.97] bambam_v1.4 5556/1,037,432 10.77 [9.05, 12.48] 2,074.86 [2051.26, 2098.46] 1:6
COAD (TGCA) 215 115 122.17 [122.06, 122.28] carnac_v1.0 60,624/1,932,510 276.68 [191.78, 361.58] 8,988.41 [8826.01, 9150.82] 1:1
KIRC (TGCA) 304 204 177.59 [177.46, 177.73] carnac_v1.0 10,489/2,416,155 33.56 [31.60, 35.51] 7,947.87 [7792.68, 8,103.07] 1:7
PAAD (TCGA) 146 46 363.09 [362.80, 363.37] carnac_v1.0 5,593/1,263,918 37.08 [33.59, 40.58] 8,656.48 [8587.71, 8725.25] 1:10.5
ESO (dbGAP) 145 45 58.39 [58.33, 58.44] MuTect2 26,098/790,051 181.85 [150.65, 213.05] 5,451.51 [5,307.16, 5595.85] 1:2.5
  1. All datasets were sequenced using Illumina technology
  2. aOnly non-silent variants in coding regions with read depth >10 and PASS somatic mutation caller filtering were taken into account