Assessing the reproducibility of exome copy number variations predictions

Table 1 Summary of methods used in CNV callers

CNV caller	Pre-processing quality control	Approach to discovering CNVs	Published validation rate
CoNIFER [9]	RPKM for each target (filter targets with median RPKM <1), ZRPKM , SVD-PCA transformation. Filter samples >0.5 SVD-ZRPKM	±1.5 SVD-ZRPKM threshold values	94 % PPV
CONTRA [4]	Removes base coverage <10, library-size correction by removing linear dependency between log-coverage and log-ratio	Base-level log-ratios using adjusted coverage, followed by region-level log-ratios using mean of base-level log ratios. Two-tailed p value on normal distribution of region-level log-ratio. Heuristic approach of using different solutions of segmentations for large CNVs	86.8 % SPE 95.4 % SEN
EXCAVATOR [14]	Data correction by using the medians of Exon-mean-read-count values respect to GC content, mappability, and exon sizes. Log-transformed ratio, LOWESS scatter plot normalization	HMM to discover five states of CNVs (double loss, loss, neutral, gain, or multiple gain)	~50 % PPV
XHMM [10]	Filter extreme GC content (<0.1 or >0.9), low complexity (>10 %), target size (<10 bp or >10 kb), samples (mean RD <25 or >500), targets (Mean RD <10 or >500). SVD-PCA normalization, remove K components = 0.7/n s	Z-score calculation as input for three-state HMM	67–92 % SEN

ISSN: 1756-994X