Skip to main content
Fig. 1 | Genome Medicine

Fig. 1

From: ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

Fig. 1

ISOWN framework for somatic mutation prediction. Variants retrieved either directly from TCGA portal in the form of VCF files or using GATK/MuTect2 pipeline (see “Implementation” section for more details) were annotated with a series of external databases. Low quality calls were removed by applying a standard set of filters. Only coding and non-silent variants were taken into account (unless otherwise indicated). After flanking regions and variant allele frequencies were calculated for each variant and data collapsed in the unique set of variants (see “Implementation” section), some variants were pre-labeled as germline based on their presence in dbSNP/common_all but not in COSMIC or as somatic based on the fact that over hundred samples with this particular mutation were submitted to COSMIC (CNT >100). The best machine learning algorithm was selected using a tenfold cross-validation approach. One hundred randomly selected samples from each dataset were used for classifier training and final accuracies were calculated based on the remaining samples

Back to article page