Pipeline for tumor purification and subsequent identification and testing of gene expression-based prognostic models. (A) Gene signature identification stage. First, profiles from tumor samples and healthy tissue are co-normalized together using the robust multi-array average (RMA) method, then input into ISOpure to estimate the purified cancer profiles c
. The purified cancer profiles are used as covariates to train an elastic net-regularized Cox proportional hazards (CPH) model (the gene signature) to predict survival data associated with each tumor sample. The trained parameters of the CPH model are used later in model testing. (B) Gene signature testing stage. First, new (test-set) tumor profiles are co-normalized with healthy tissue profiles and purified using ISOpure. Each purified cancer profile is then used to compute a risk score for the corresponding patient, using the CPH model parameters learned in the training stage. Patients in a test cohort are then divided into low-risk and high-risk groups based on their risk score, and the hazard ratio is calculated to evaluate the low- and high-risk classifications.