Epigenetic variable outliers for risk prediction analysis. (a) Flowchart describing the EVORA model. (i) Age-associated DNAm variation and age-independent differentially variable DNAm are both correlated with the risk of prospective neoplasia (CIN2+). aCpGs undergoing age-associated hypermethylation (hyperM) overlap strongly with differential variable CpGs (DVCs) that exhibit increased variance in future CIN2+ cases. This overlap defines, for a given training set, the pool of candidate risk CpGs. (ii) Multiple training/test set partitions in a ten-fold internal cross-validation on COPA-transformed (Methods) methylation profiles is used to optimize the COPA threshold and the set of risk CpGs. (iii) Risk prediction using EVORA: for an independent sample its risk score is estimated as the fraction of risk CpGs with a β-value larger than the optimal threshold, as evaluated in the COPA-basis. (b) EVORA receiver operating characteristic (ROC) curve, AUC and its 95% confidence interval in the ARTISTIC cohort (152 normal samples: 75 future CIN2+, 77 normals). (c) Comparison of C-index (AUC) values obtained using EVORA with a classification algorithm based on detecting differences in mean methylation levels (mean) in the ARTISTIC cohort. Boxplots are over 100 distinct training-test set partitions and P-values are from a Wilcoxon test detecting deviation from the expected null (C-index = 0.5) as well as between the two classification algorithms. (d) EVORA ROC curve in set 1 (48 liquid-based cytology samples: 18 CIN2+, 30 normals). (e) EVORA ROC curve in set 2 (63 cervical tissue samples: 48 cancers, 15 normals). In all ROC curves, AUC values and 95% confidence intervals shown. FPR: false positive rate; Se: sensitivity.