Skip to main content

Table 1 Characteristics of cancer datasets used for training and/or validation

From: ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

Dataset (source)

Number of samples

Number of samples used in testing

Mean read depth after filtering [95% CI]

Mutation calling pipeline

Total number of somatic/germline SNVsa in all samples

Mean somatic SNVs per sample [95% CI]a

Mean germline per sample [95% CI]a

Ratio somatic to germline (after collapsing)

UCEC (TGCA)

251

151

88.84 [88.77, 88.92]

bambam_v1.4

38,012/504,241

147.015 [88.36, 205.66]

2,008.92 [1,972.23, 2,045.62]

2:1

BRCA (TCGA)

500

400

85.92 [85.87, 85.97]

bambam_v1.4

5556/1,037,432

10.77 [9.05, 12.48]

2,074.86 [2051.26, 2098.46]

1:6

COAD (TGCA)

215

115

122.17 [122.06, 122.28]

carnac_v1.0

60,624/1,932,510

276.68 [191.78, 361.58]

8,988.41 [8826.01, 9150.82]

1:1

KIRC (TGCA)

304

204

177.59 [177.46, 177.73]

carnac_v1.0

10,489/2,416,155

33.56 [31.60, 35.51]

7,947.87 [7792.68, 8,103.07]

1:7

PAAD (TCGA)

146

46

363.09 [362.80, 363.37]

carnac_v1.0

5,593/1,263,918

37.08 [33.59, 40.58]

8,656.48 [8587.71, 8725.25]

1:10.5

ESO (dbGAP)

145

45

58.39 [58.33, 58.44]

MuTect2

26,098/790,051

181.85 [150.65, 213.05]

5,451.51 [5,307.16, 5595.85]

1:2.5

  1. All datasets were sequenced using Illumina technology
  2. aOnly non-silent variants in coding regions with read depth >10 and PASS somatic mutation caller filtering were taken into account