Documentation for algorithms, their parameters and usage in the analysis together with all results are available in Additional file 1.
Glioblastoma multiforme data set
The glioblastoma data set was originally released in 2008  and has been updated online since then. An updated revision was used in the present work: comparative genomic hybridization array (aCGH), single nucleotide polymorphism (SNP), exon, gene expression and microRNA (miRNA) data were accessed May to August 2009, while methylation and clinical data were accessed October to November 2009. The data set consists of 338 primary glioblastoma patients with clinical annotations. Data were analyzed from the following microarray platforms: Affymetrix HU133A (269 GBM samples, 10 control samples), Affymetrix Human Exon 1.0 (298 GBM samples, 10 control samples), Agilent 244 k aCGH (238 GBM samples), Affymetrix SNP Array 6.0 (214 GBM blood samples), Illumina GoldenGate methylation array (243 GBM samples) and Agilent miRNA array (251 GBM samples, 10 control samples). Pre-normalized data (level 2) were used for gene, exon and miRNA expression and methylation arrays. Raw data (level 1) were used for aCGH and SNP platforms. Clinical annotations were used to compute the duration of patient survival in months from the initial diagnosis to death or to the last follow-up. The publicly available results in the present work do not reveal protected patient information.
Gene expression analyses
The gene and exon expression platforms include ten control samples from brain tissue extracted from non-cancer patients in addition to the glioblastoma samples. Transcript level expressions are calculated from the exon level expression data by considering the problem of transforming the exon-level data to transcripts as a least squares problem. For ith gene having m exons and n transcripts in Ensembl (v.58) we define a vector e
of length m that denotes the measured exon expressions, and an m times n matrix A
, where the values in each column denote if the exon belongs to the transcript (1) or not (0). Transcript expression values t
are solved from the equation A
using the QR decomposition to ensure numerical stability. The gene level expression values for the exon array platform were computed by taking a median of the intensity of all the exons linked with the gene in Ensembl.
Differential expression is determined by computing fold changes and applying a t-test between glioblastoma and control groups, followed by multiple hypotheses correction . Fold changes are computed by dividing the mean of glioblastoma expression values by the mean of control expression values.
Transcriptome survival analysis
Differentially expressed splice variants were selected as the basis of expression survival analysis. There were 8,887 splice variants (out of a total 75,083) that were differentially expressed having absolute fold change >2 and a multiple hypothesis corrected P-value < 0.05. For these splice variants we computed sample-specific fold changes by dividing the sample expression value by the mean of control expression values. These fold changes (FC) were discretized into classes denoted by '-1' (underexpression, FC < 0.5), '1' (overexpression, FC >2) and '0' (stable expression), and the samples were divided into three groups accordingly. This grouping was used in Kaplan-Meier survival analysis and groups with <20 patients were excluded. A log-rank test was computed for each differentially expressed splice variant.
SNP survival analysis
Affymetrix SNP 6.0 genotypes were called with the CRLMM algorithm . Samples with a signal-to-noise ratio below five and markers with call probabilities below 0.95 were discarded. We restricted our analysis to a genetically homogeneous pool of samples by using only ethnically similar samples. Markers with a relative minor allele frequency below 0.1 were excluded from the survival analysis. The study time in the survival analysis was 36 months. If the size of the patient group with the rare homozygote genotype in a marker was less than 15, or its frequency was less than 0.1, then the rare homozygote group was combined with the heterozygote group. The uncorrected P-value limit was set to 0.0001.
Copy number and expression integration
Normalized aCGH data from tumor samples were segmented using circular binary segmentation . A segment was called aberrated if its mean was over 0.632 or below -0.632. These thresholds were estimated from the 64 blood versus blood controls as two standard deviations from the mean of normalized probe intensities.
Based on gain and loss frequencies for each splice variant, aCGH and splice variant expression data were integrated with the statistical method originally applied to breast cancer [11, 12]. Briefly, the samples are first divided into amplified and non-amplified groups. The difference of the expression means in these groups is divided by the sum of their standard deviation, resulting in a weight value. Then statistical significance for the weight value is computed by randomly permuting the samples into amplified and non-amplified groups and comparing the permuted weight value to the original.
miRNA expression analysis
Differentially expressed miRNA genes were determined using the same procedure as for gene expression platforms. Annotations for target sites of miRNAs were obtained from the miRBase::Targets database . Only target sites with a P-value < 10-5 were included. MiRBase::Targets version 4 was used to match the annotations used in constructing the Agilent human miRNA array (G4470A).
DNA methylation arrays
Illumina DNA Methylation Cancer Panel I (808 gene promoters) and a custom Illumina GoldenGate array (1,498 gene promoters) were used in the methylation analysis. Processed beta values were used as provided by the TCGA. The beta value is defined as M/(M + U), where M and U are signal levels of methylation and unmethylation, respectively. The range of beta is 0 to 1, with 0 indicating hypomethylation and 1 indicating hypermethylation. Probes that target the same gene promoter were combined by taking the median of beta values so that each gene has a unique combined beta.
Small interfering RNA assays
Cell lines A172 and U87MG were obtained from the European Collection of Cell Cultures (ECACC, Salisbury, UK), LN405 from Deutsche Sammlung von Microorganismen und Zellkulturen GmbH (DSMZ, Braunschweig, Germany) and SVGp12 from American Type Culture Collection (ATCC, Manassas, VA, USA). Cells were cultured in medium conditions recommended by the providers.
The small interfering RNAs (siRNAs) were purchased from Qiagen (Qiagen GmbH, Germany) and include AllStars Hs Cell Death Control siRNA and AllStars Negative Control siRNA; siRNA sequences for the other 11 genes are given in Additional file 2. Each siRNA was assayed as three replicate wells, and for each gene four siRNAs were used in reverse transfection. Briefly, the siRNAs were printed robotically to 384-well white, clear-bottom assay plates (Greiner Bio-One GmbH, Frickenhausen, Germany). SilentFect transfection agent (Bio-Rad Laboratories, Hercules, CA, USA) or Lipofectamine RNAiMax (Invitrogen, Carlsbad, CA, USA) diluted into OptiMEM (Gibco Invitrogen, Carlsbad, CA, USA) was aliquoted into each 384-plate well using a Multidrop 384 Microplate Dispenser (Thermo Fisher Scientific Inc, Waltham, MA, USA), and the plates were incubated for 1 h at room temperature. Subsequently, 35 μl of cell suspension (1,500 cells of A172, U87MG and SVGp12 or 1,200 LN405 cells) was added on top of the siRNA-lipid complexes (13 nM final siRNA concentration) and the plates were incubated for 48 h or 72 h at +37°C with 5% CO2.
Proliferation assay and analysis of caspase-3 and -7 activities
Cell proliferation was assayed 72 h after transfection with CellTiter-Glo Cell Viability assay (Promega, Madison, WI, USA) and induction of caspase-3 and -7 activities was detected 48 h after transfection either with homogeneous Caspase-Glo 3/7 assay or Apo-ONE assay (Promega). All assays were performed according to the manufacturer's instructions. The signals were quantified by using an Envision Multilabel Plate Reader (Perkin-Elmer, Massachusetts, MA, USA). Both assays were repeated twice from independent transfections. Signals from the proliferation and caspase-3/7 assays were calculated and presented as relative signal to the mean of negative control siRNA replicate wells that was given a value of one. The values for each siRNA were then transformed into robust z-scores using median of the replicates and the median absolute deviation (MAD). A t-test (two-tailed, unequal variances) was calculated for each siRNA treatment and P-values < 0.05, < 0.01 and < 0.001 were taken as significant.
Data for CDKN2A and MSN are from an earlier siRNA screen and the values have been normalized to the background signal of each plate. The values were normalized using a LOESS method similar to the one implemented in the cellHTS2 R-package . Briefly, the statistical outliers were down-weighted when a polynomial surface was fitted to the intensities within each assay plate using local regression . This ensured a robust fit even if plates differ in hit-rate. The fit, representing a systematic background signal, was then subtracted from the values. A span of 0.35 and a degree of two for polynomial kernel were used. Robust z-scores were then calculated from the corrected data.