From: Assessing the reproducibility of exome copy number variations predictions
CNV caller | Pre-processing quality control | Approach to discovering CNVs | Published validation rate |
---|---|---|---|
CoNIFER [9] | RPKM for each target (filter targets with median RPKM <1), ZRPKM , SVD-PCA transformation. Filter samples >0.5 SVD-ZRPKM | ±1.5 SVD-ZRPKM threshold values | 94 % PPV |
CONTRA [4] | Removes base coverage <10, library-size correction by removing linear dependency between log-coverage and log-ratio | Base-level log-ratios using adjusted coverage, followed by region-level log-ratios using mean of base-level log ratios. Two-tailed p value on normal distribution of region-level log-ratio. Heuristic approach of using different solutions of segmentations for large CNVs | 86.8 % SPE 95.4 % SEN |
EXCAVATOR [14] | Data correction by using the medians of Exon-mean-read-count values respect to GC content, mappability, and exon sizes. Log-transformed ratio, LOWESS scatter plot normalization | HMM to discover five states of CNVs (double loss, loss, neutral, gain, or multiple gain) | ~50 % PPV |
XHMM [10] | Filter extreme GC content (<0.1 or >0.9), low complexity (>10 %), target size (<10 bp or >10 kb), samples (mean RD <25 or >500), targets (Mean RD <10 or >500). SVD-PCA normalization, remove K components = 0.7/n s | Z-score calculation as input for three-state HMM | 67–92 % SEN |