Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme
- Kristian Ovaska1,
- Marko Laakso†1,
- Saija Haapa-Paananen†2,
- Riku Louhimo1,
- Ping Chen1,
- Viljami Aittomäki1,
- Erkka Valo1,
- Javier Núñez-Fontarnau1,
- Ville Rantanen1,
- Sirkku Karinen1,
- Kari Nousiainen1,
- Anna-Maria Lahesmaa-Korpinen1,
- Minna Miettinen1,
- Lilli Saarinen1,
- Pekka Kohonen2,
- Jianmin Wu1,
- Jukka Westermarck3, 4 and
- Sampsa Hautaniemi1Email author
© Ovaska et al.; licensee BioMed Central Ltd. 2010
Received: 15 July 2010
Accepted: 7 September 2010
Published: 7 September 2010
Coordinated efforts to collect large-scale data sets provide a basis for systems level understanding of complex diseases. In order to translate these fragmented and heterogeneous data sets into knowledge and medical benefits, advanced computational methods for data analysis, integration and visualization are needed.
We introduce a novel data integration framework, Anduril, for translating fragmented large-scale data into testable predictions. The Anduril framework allows rapid integration of heterogeneous data with state-of-the-art computational methods and existing knowledge in bio-databases. Anduril automatically generates thorough summary reports and a website that shows the most relevant features of each gene at a glance, allows sorting of data based on different parameters, and provides direct links to more detailed data on genes, transcripts or genomic regions. Anduril is open-source; all methods and documentation are freely available.
We have integrated multidimensional molecular and clinical data from 338 subjects having glioblastoma multiforme, one of the deadliest and most poorly understood cancers, using Anduril. The central objective of our approach is to identify genetic loci and genes that have significant survival effect. Our results suggest several novel genetic alterations linked to glioblastoma multiforme progression and, more specifically, reveal Moesin as a novel glioblastoma multiforme-associated gene that has a strong survival effect and whose depletion in vitro significantly inhibited cell proliferation. All analysis results are available as a comprehensive website.
Our results demonstrate that integrated analysis and visualization of multidimensional and heterogeneous data by Anduril enables drawing conclusions on functional consequences of large-scale molecular data. Many of the identified genetic loci and genes having significant survival effect have not been reported earlier in the context of glioblastoma multiforme. Thus, in addition to generally applicable novel methodology, our results provide several glioblastoma multiforme candidate genes for further studies.
Anduril is available at http://csbi.ltdk.helsinki.fi/anduril/
The glioblastoma multiforme analysis results are available at http://csbi.ltdk.helsinki.fi/anduril/tcga-gbm/
Comprehensive characterization of complex diseases calls for coordinated efforts to collect and share genome-scale data from large patient cohorts. A prime example of such a coordinated effort is The Cancer Genome Atlas (TCGA), which currently provides more than five billion data points on glioblastoma multiforme (GBM) with the aim of improving diagnosis, treatment and prevention of GBM .
Translating genome-scale data into knowledge and further to effective diagnosis, treatment and prevention strategies requires computational tools that are designed for large-scale data analysis as well as for the integration of multidimensional data with clinical parameters and knowledge available in bio-databases. In addition, it is evident that until data integration tools are developed to the level that experimental scientists can independently interpret the vast amounts of data generated by genome-scale technologies, most of the potential of the generated data will be severely underexploited. In order to address these challenges, we have developed a data analysis and integration framework, Anduril, which facilitates the integration of various data formats, bio-databases and analysis techniques. Anduril manages and automates analysis workflows from importing raw data to reporting and visualizing the results. In order to facilitate interpretation of the large-scale data analysis results, Anduril generates a website that shows the most relevant features of each gene at a glance, allows sorting of data based on different parameters, and provides direct links to more detailed views of genes, transcripts, genomic regions, protein-protein interactions and pathways.
We demonstrate the utility of the Anduril framework by analyzing heterogeneous and multidimensional data from 338 GBM patients . GBM is an aggressive brain cancer having a median survival of one year and is remarkably resistant to all current anti-cancer therapeutic regimens . In order to understand the complex molecular mechanisms behind GBM, earlier efforts have analyzed data from one or two platforms, such as mutations, copy number and gene expression profiles and methylation patterns [3–7]. In contrast, we have analyzed all TCGA provided GBM data sets and collected the results into a comprehensive website that facilitates the interpretation of the data and allows an advanced view of genes and genomic regions crucial to GBM progression. Most importantly, Anduril can be applied to data from any accessible source.
Materials and methods
Documentation for algorithms, their parameters and usage in the analysis together with all results are available in Additional file 1.
Glioblastoma multiforme data set
The glioblastoma data set was originally released in 2008  and has been updated online since then. An updated revision was used in the present work: comparative genomic hybridization array (aCGH), single nucleotide polymorphism (SNP), exon, gene expression and microRNA (miRNA) data were accessed May to August 2009, while methylation and clinical data were accessed October to November 2009. The data set consists of 338 primary glioblastoma patients with clinical annotations. Data were analyzed from the following microarray platforms: Affymetrix HU133A (269 GBM samples, 10 control samples), Affymetrix Human Exon 1.0 (298 GBM samples, 10 control samples), Agilent 244 k aCGH (238 GBM samples), Affymetrix SNP Array 6.0 (214 GBM blood samples), Illumina GoldenGate methylation array (243 GBM samples) and Agilent miRNA array (251 GBM samples, 10 control samples). Pre-normalized data (level 2) were used for gene, exon and miRNA expression and methylation arrays. Raw data (level 1) were used for aCGH and SNP platforms. Clinical annotations were used to compute the duration of patient survival in months from the initial diagnosis to death or to the last follow-up. The publicly available results in the present work do not reveal protected patient information.
Gene expression analyses
The gene and exon expression platforms include ten control samples from brain tissue extracted from non-cancer patients in addition to the glioblastoma samples. Transcript level expressions are calculated from the exon level expression data by considering the problem of transforming the exon-level data to transcripts as a least squares problem. For ith gene having m exons and n transcripts in Ensembl (v.58) we define a vector e i of length m that denotes the measured exon expressions, and an m times n matrix A i , where the values in each column denote if the exon belongs to the transcript (1) or not (0). Transcript expression values t i are solved from the equation A i t i = e i using the QR decomposition to ensure numerical stability. The gene level expression values for the exon array platform were computed by taking a median of the intensity of all the exons linked with the gene in Ensembl.
Differential expression is determined by computing fold changes and applying a t-test between glioblastoma and control groups, followed by multiple hypotheses correction . Fold changes are computed by dividing the mean of glioblastoma expression values by the mean of control expression values.
Transcriptome survival analysis
Differentially expressed splice variants were selected as the basis of expression survival analysis. There were 8,887 splice variants (out of a total 75,083) that were differentially expressed having absolute fold change >2 and a multiple hypothesis corrected P-value < 0.05. For these splice variants we computed sample-specific fold changes by dividing the sample expression value by the mean of control expression values. These fold changes (FC) were discretized into classes denoted by '-1' (underexpression, FC < 0.5), '1' (overexpression, FC >2) and '0' (stable expression), and the samples were divided into three groups accordingly. This grouping was used in Kaplan-Meier survival analysis and groups with <20 patients were excluded. A log-rank test was computed for each differentially expressed splice variant.
SNP survival analysis
Affymetrix SNP 6.0 genotypes were called with the CRLMM algorithm . Samples with a signal-to-noise ratio below five and markers with call probabilities below 0.95 were discarded. We restricted our analysis to a genetically homogeneous pool of samples by using only ethnically similar samples. Markers with a relative minor allele frequency below 0.1 were excluded from the survival analysis. The study time in the survival analysis was 36 months. If the size of the patient group with the rare homozygote genotype in a marker was less than 15, or its frequency was less than 0.1, then the rare homozygote group was combined with the heterozygote group. The uncorrected P-value limit was set to 0.0001.
Copy number and expression integration
Normalized aCGH data from tumor samples were segmented using circular binary segmentation . A segment was called aberrated if its mean was over 0.632 or below -0.632. These thresholds were estimated from the 64 blood versus blood controls as two standard deviations from the mean of normalized probe intensities.
Based on gain and loss frequencies for each splice variant, aCGH and splice variant expression data were integrated with the statistical method originally applied to breast cancer [11, 12]. Briefly, the samples are first divided into amplified and non-amplified groups. The difference of the expression means in these groups is divided by the sum of their standard deviation, resulting in a weight value. Then statistical significance for the weight value is computed by randomly permuting the samples into amplified and non-amplified groups and comparing the permuted weight value to the original.
miRNA expression analysis
Differentially expressed miRNA genes were determined using the same procedure as for gene expression platforms. Annotations for target sites of miRNAs were obtained from the miRBase::Targets database . Only target sites with a P-value < 10-5 were included. MiRBase::Targets version 4 was used to match the annotations used in constructing the Agilent human miRNA array (G4470A).
DNA methylation arrays
Illumina DNA Methylation Cancer Panel I (808 gene promoters) and a custom Illumina GoldenGate array (1,498 gene promoters) were used in the methylation analysis. Processed beta values were used as provided by the TCGA. The beta value is defined as M/(M + U), where M and U are signal levels of methylation and unmethylation, respectively. The range of beta is 0 to 1, with 0 indicating hypomethylation and 1 indicating hypermethylation. Probes that target the same gene promoter were combined by taking the median of beta values so that each gene has a unique combined beta.
Small interfering RNA assays
Cell lines A172 and U87MG were obtained from the European Collection of Cell Cultures (ECACC, Salisbury, UK), LN405 from Deutsche Sammlung von Microorganismen und Zellkulturen GmbH (DSMZ, Braunschweig, Germany) and SVGp12 from American Type Culture Collection (ATCC, Manassas, VA, USA). Cells were cultured in medium conditions recommended by the providers.
The small interfering RNAs (siRNAs) were purchased from Qiagen (Qiagen GmbH, Germany) and include AllStars Hs Cell Death Control siRNA and AllStars Negative Control siRNA; siRNA sequences for the other 11 genes are given in Additional file 2. Each siRNA was assayed as three replicate wells, and for each gene four siRNAs were used in reverse transfection. Briefly, the siRNAs were printed robotically to 384-well white, clear-bottom assay plates (Greiner Bio-One GmbH, Frickenhausen, Germany). SilentFect transfection agent (Bio-Rad Laboratories, Hercules, CA, USA) or Lipofectamine RNAiMax (Invitrogen, Carlsbad, CA, USA) diluted into OptiMEM (Gibco Invitrogen, Carlsbad, CA, USA) was aliquoted into each 384-plate well using a Multidrop 384 Microplate Dispenser (Thermo Fisher Scientific Inc, Waltham, MA, USA), and the plates were incubated for 1 h at room temperature. Subsequently, 35 μl of cell suspension (1,500 cells of A172, U87MG and SVGp12 or 1,200 LN405 cells) was added on top of the siRNA-lipid complexes (13 nM final siRNA concentration) and the plates were incubated for 48 h or 72 h at +37°C with 5% CO2.
Proliferation assay and analysis of caspase-3 and -7 activities
Cell proliferation was assayed 72 h after transfection with CellTiter-Glo Cell Viability assay (Promega, Madison, WI, USA) and induction of caspase-3 and -7 activities was detected 48 h after transfection either with homogeneous Caspase-Glo 3/7 assay or Apo-ONE assay (Promega). All assays were performed according to the manufacturer's instructions. The signals were quantified by using an Envision Multilabel Plate Reader (Perkin-Elmer, Massachusetts, MA, USA). Both assays were repeated twice from independent transfections. Signals from the proliferation and caspase-3/7 assays were calculated and presented as relative signal to the mean of negative control siRNA replicate wells that was given a value of one. The values for each siRNA were then transformed into robust z-scores using median of the replicates and the median absolute deviation (MAD). A t-test (two-tailed, unequal variances) was calculated for each siRNA treatment and P-values < 0.05, < 0.01 and < 0.001 were taken as significant.
Data for CDKN2A and MSN are from an earlier siRNA screen and the values have been normalized to the background signal of each plate. The values were normalized using a LOESS method similar to the one implemented in the cellHTS2 R-package . Briefly, the statistical outliers were down-weighted when a polynomial surface was fitted to the intensities within each assay plate using local regression . This ensured a robust fit even if plates differ in hit-rate. The fit, representing a systematic background signal, was then subtracted from the values. A span of 0.35 and a degree of two for polynomial kernel were used. Robust z-scores were then calculated from the corrected data.
Workflows are constructed using a custom workflow language called AndurilScript that resembles traditional programming languages and is designed to enable rapid construction of complex workflows. The elementary processing steps in a workflow are implemented by Anduril components, which are reusable software packages written in various programming languages, for instance, R, Java, MATLAB, Octave, Python and Perl. Components are executable processes that communicate with the workflow through files. The component model is programming language independent since the only requirement is the ability to read and write files. At the AndurilScript level, components are accessed using their external interfaces, which hides implementation details. The components can use software libraries, such as Bioconductor  and Weka , to bring well-tested libraries to the workflow environment. It is also possible to invoke command-line programs from workflows. Currently, the Anduril core repository consists of more than a hundred components, and new components are added regularly. For instance, we designed a computational platform to generate networks from a list of genes by integrating pathway and protein-protein interaction data in Anduril . This represents a component bundle that uses the Anduril framework but is distributed independently from the Anduril core.
Anduril includes advanced features for working with complex workflows. Large workflows can be divided into nested subworkflows, so that each hierarchical level is simple to maintain. When a workflow is executed several times, Anduril caches results of components and only executes the components whose configuration has changed since the last run, which reduces execution time significantly. Selected parts of workflows can be enabled based on dynamic conditions, which increases the flexibility of the workflows.
Compared to traditional programming environments, for instance, R coupled with Bioconductor, the advantages of Anduril are the use of workflows and the support for several programming languages. Workflows have a higher level of abstraction than R code, which increases productivity and enables visualization of analysis configuration. Compared to workflow frameworks GenePattern , Ergatis  and Taverna , Anduril provides several novel features, such as efficient programming-like workflow construction with an advanced workflow engine, algorithms specifically designed for large-scale data analysis and automated result website construction, that enable efficient analysis and visualization of large-scale data sets (see  for details).
Anduril-generated result report and website for GBM data interpretation
Analyses performed and corresponding TCGA glioblastoma data sets
Differentially expressed genes
Differentially expressed transcripts (DETs)
Differentially expressed miRNAs
Survival affecting germline SNPs
Blood SNP arrays, clinical data
Survival affecting DETs
Exon expression, clinical data
Survival affecting differentially expressed miRNAs
miRNA expression, clinical data
Chromosomal aberrations (amplification and deletion)
Integration of differential expression and chromosomal aberrations
aCGH, exon expression
Integration of copy number and transcript expression GBM data
We identified genes that are frequently amplified or deleted in GBM samples and integrated these results with expression data in order to identify genes whose altered expression activity can potentially be explained by chromosomal aberrations. Genomic regions with significant amplifications include 7p11.2 (amplified in up to 54% of patients, housing EGFR), 12q13-12q15 (14%) and 4q12 (14%).
Integration of aCGH and exon expression data reveals 16 genes for which amplification is an explanatory factor for overexpression (P < 0.01 and gain frequency >5%). Of these, EGFR is amplified on the aCGH platform and overexpressed on both gene expression platforms (fold change 2.8 to 6.2; Additional file 3, panel A). EGFR is also hypomethylated (beta = 0.03), which may be an additional explanatory mechanism for its overexpression. However, not all genes located in the amplified region 7p11.2 show marked overexpression in the total patient population (Additional file 3, panel A). For example, LANCL2 (the closest annotated gene to EGFR in the 7p11.2 region) is amplified in 24% of patients but shows underexpression in the exon platform and only slight over-expression in the gene expression platform. Similar differential expression is seen also between METTL1 (overexpressed) and AGAP2 (underexpressed) in the amplified chromosomal location 12q14.1 (Additional file 3, panel B).
Gene deletions are generally thought to result in downregulation of the expression of genes coded by the deleted genomic region. Interestingly, Anduril-based analysis of the two most frequently deleted genes at 9p21.3, MTAP and CDKN2A, shows that even though the gene deletion is an explanatory factor for lower expression of these genes in patients with deletion, in total GBM patient material the MTAP expression is not inhibited and CDKN2A is overexpressed compared to normal tissue (Additional file 4). The seemingly contradictory correlation between gene deletion and overexpression suggests activation of MTAP and CDKN2 promoters, and thereby increased gene expression levels in patients who have not yet lost one or two copies of these genes. This hypothesis is supported by the observation that in patients with remaining MTAP and CDKN2 alleles, both MTAP and CDKN2A are hypomethylated. On the other hand, another gene at 9p21.3 (ELAVL2) shows classical behavior of a deleted gene; its expression correlates with deletion, and it is also significantly downregulated in both expression platforms.
These examples illustrate that Anduril allows researchers to detect critical parameters affecting expression levels of the gene of interest at a glance. Our results demonstrate that integrated data analysis combining amplification, expression, and methylation status is integral in order to draw conclusions about functional consequences of gene amplifications or deletions detected by aCGH microarrays.
Survival analysis of GBM data
Probably the most important feature of the Anduril analysis of the GBM data is the integration of patient survival information with both expression and SNP data, thereby allowing the user to sort the genomic alterations according to their clinical relevance.
In order to examine the relevance of gene expression levels to patient survival in GBM, we first searched for genes whose overexpression correlated significantly with poor survival (P < 0.01). Among the 100 most upregulated genes, only 15 genes showed significant correlation with poor survival. On the other hand, out of the top ten survival affecting genes, only one gene (MSN, encoding Moesin) showed consistent overexpression in the gene and exon expression platforms (Figure 2a). All the other genes affecting survival in this group were underexpressed. Three of the top ten genes affecting survival (ADAM22, SCRIB, WAC) had at least one transcript that was overexpressed when analyzed on the exon array platform. However, survival effects of these genes are related to underexpressed splice variants instead of the overexpressed variants. Together these results show that gene repression is a common mode for gene regulation among the genes that have the most significant survival effect in GBM. These results challenge the general assumption that the level of gene overexpression is the major determinant to separate between clinically relevant and non-relevant genes.
In order to test the association between genetic alterations in GBM and their relevance to patient survival, we linked gene amplifications, expression profiles and survival data. Among the 300 most amplified genes, only filamin C gamma (FLNC; 7q32.1) is amplified (9% of the patients) with consistent overexpression in the gene and exon arrays and significant survival effect (P < 0.01). Together these results indicate that there is unexpectedly poor concordance between gene amplification, overexpression of the genes from the amplicons, and patient survival in GBM.
In general, individual miRNA survival effects in GBM were much smaller than expression survival effects, which may be explained by their indirect mechanism of action. The highest expressed miRNA in the GBM data was hsa-miR-21 (fold change 15.5), which has been shown to increase apoptotic activity and reduce tumor size in vivo [30–32]. Some of the most downregulated miRNAs according to our analysis were hsa-miR-124a, hsa-miR-137, hsa-miR-7, hsa-miR-128a and hsa-miR-128b. All of these have been connected functionally to glioblastoma, either via neuronal differentiation or growth regulation .
Finally, we correlated 550,000 SNPs on the SNP arrays to survival using Kaplan-Meier and log-rank methods. This analysis identified 50 genes that contain survival-associated SNPs. Of these genes, KIAA0040 is also overexpressed (fold change 1.7 to 2.6) and associated with poor survival in exon array data (P < 8.7 × 10-4). The role of KIAA0040 in cancer progression is also supported by a recent study where KIAA0040 overexpression was shown to correlate with poor prognosis in breast cancer . Another example of a gene showing a significant survival-affecting SNP is rs17258085 of ODZ3. In contrast to KIAA0040, this gene is significantly underexpressed in the GBM samples.
Functional analysis of survival-affecting genes in vitro
Functional siRNA screening data for 11 GBM survival-associated genes in the A172 glioblastoma cell line
AllStars Cell Death Vontrol siRNA (Cell death ctrl)
AllStars Negative Control siRNA (siNEG)
Kinesin family member 11
Polo-like kinase 1
Filamin C, gamma
Histone cluster 1, H4l
Cyclin-dependent kinase inhibitor 2A
NA (gene deleted in A172)
Large-scale data gathering efforts require software and computational tools to facilitate interpretation of the data. We have developed Anduril, an efficient and systematic data integration framework, to conduct large-scale data analysis that necessarily requires a number of processing steps before the data can be interpreted. In the GBM analysis here, the workflow contained approximately 350 processing steps, demonstrating the efficiency of workflows - more code would be needed when working with traditional programming languages - as well as highlighting the need for complexity management in workflow software. The structure of the analysis is automatically documented together with all execution parameters of the participating components, which enables reproduction of the results. Anduril supports modular and programming-like workflow construction, which together with automated component testing and a version control system allows a team of bioinformaticians to work on the project simultaneously and to seamlessly integrate the analysis results.
We have demonstrated the utility of the Anduril framework with the GBM data from TCGA, one of the largest multidimensional cancer data sets currently available. We focused on the integration of mRNA expression, SNPs and copy number data to clinical parameters as these results can provide evidence of potential molecular markers with impact on GBM progression. This also facilitates the sorting of the genomic alterations according to their clinical relevance and further helps to focus future mechanistic studies on genetic alterations that have evidence of clinical relevance. While TCGA GBM data sources, such as The Cancer Genome Atlas Portal and the Cancer Molecular Analysis Portal, provide box-plots for single genes and genome-wide heatmaps, Anduril offers a significant step forward. It enables a comprehensive view of the most critical parameters influencing expression, miRNA, SNP and copy number levels, as well as correlation of these data to survival at a glance. In addition, Anduril provides a number of direct links to external databases, and is thus an easy access point for interpreting the vast amounts of heterogeneous data from multiple sources. These characteristics of Anduril facilitate scientists without bioinformatics training to interpret complex data sets, such as TCGA.
Analysis of the GBM data demonstrates the utility of Anduril in translating fragmented data to testable predictions. For example, detection of amplified genomic regions has traditionally been used to identify genes with potential causal roles in oncogenesis . However, whether genomic amplification generally results in clinically relevant changes in gene expression from the amplicon has been difficult to assess because of the lack of Anduril-type websites combining gene expression, patient survival and aCGH amplification data. Our results show surprisingly poor concordance between gene amplification, overexpression of the genes in the amplicons, and patient survival. For example, even though EGFR is the most often amplified gene in GBM (54% of patients), and this amplification has been considered as a hallmark of the disease, EGFR overexpression does not correlate well with overall patient survival (P < 0.122). This result is supported by a recent study demonstrating that EGFR amplification does not determine patient survival in primary GBM . Instead, our results demonstrate that gene repression, rather than activation, is a common mode for gene regulation among the genes that have the most significant effect on survival in GBM.
Interestingly, many of the most survival-affecting genes have not been previously implicated in GBM pathogenesis. An example of such a gene is ZRANB1 (encoding ubiquitin thioesterase), which is downregulated in exon arrays and has a strong survival effect (P < 3.2 × 10-5). It has been shown in Drosophila and in human cancer cell lines to function as a positive regulator of Wnt-signaling . Another interesting survival-affecting gene revealed by our analysis is MSN (encoding Moesin). We have functionally demonstrated that Moesin depletion by siRNA significantly inhibited cell proliferation and induced apoptosis. Moesin is functionally involved in regulation of actin cytoskeleton and cell migration, which indicates that in GBM it may promote, in addition to proliferation, the highly invasive behavior of GBM cells.
The different analysis approaches described herein demonstrate the ability of Anduril to integrate several types of genomic information and above all its capacity to determine which of the observed genetic alterations have an impact on patient survival. In this regard, Anduril clearly facilitates scientists to focus future functional analysis on those cancer-related genes that have already been verified to have clinical significance. Interestingly, each of the survival analyses described above (SNP, expression level, copy number changes) identified clinically relevant genomic alterations in genes for which cancer relevance is not presently established. It is anticipated that further studies of genes (for example, MSN and ZRANB1) and clinically relevant SNPs (for example, rs2285218 in KIAA0040) will produce interesting novel mechanistic insights into GBM progression and oncogenesis.
comparative genomic hybridization array
small interfering RNA
single nucleotide polymorphism
The Cancer Genome Atlas.
The authors are grateful to Dr Outi Monni for critically commenting on the manuscript, and Drs Olli Tynninen and Petri Bono for fruitful discussions. This work was supported by Academy of Finland (projects: 125826, 128416 and 1121413; Center of Excellence in Translational Genome-Scale Biology (SHP, PK)), Sigrid Jusélius Foundation, Foundation for the Finnish Cancer Institute, Finnish Cancer Associations, Biocentrum Helsinki, Helsinki University Funds, Finnish Graduate School in Computational Sciences (ML, PC, SK) and Helsinki Biomedical Graduate School (KO, AMLK).
- Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008, 455: 1061-1068. 10.1038/nature07385.View Article
- Furnari FB, Fenton T, Bachoo RM, Mukasa A, Stommel JM, Stegh A, Hahn WC, Ligon KL, Louis DN, Brennan C, Chin L, DePinho RA, Cavenee WK: Malignant astrocytic glioma: genetics, biology, and paths to treatment. Genes Dev. 2007, 21: 2683-2710. 10.1101/gad.1596707.PubMedView Article
- Bredel M, Scholtens DM, Harsh GR, Bredel C, Chandler JP, Renfrow JJ, Yadav AK, Vogel H, Scheck AC, Tibshirani R, Sikic BI: A network model of a cooperative genetic landscape in brain tumors. JAMA. 2009, 302: 261-275. 10.1001/jama.2009.997.PubMedPubMed CentralView Article
- Brennan C, Momota H, Hambardzumyan D, Ozawa T, Tandon A, Pedraza A, Holland E: Glioblastoma subclasses can be defined by activity among signal transduction pathways and associated genomic alterations. PLoS One. 2009, 4: e7752-10.1371/journal.pone.0007752.PubMedPubMed CentralView Article
- Cerami E, Demir E, Schultz N, Taylor BS, Sander C: Automated network analysis identifies core pathways in glioblastoma. PLoS One. 2010, 5: e8918-10.1371/journal.pone.0008918.PubMedPubMed CentralView Article
- Gaire RK, Bailey J, Bearfoot J, Campbell IG, Stuckey PJ, Haviv I: MIRAGAA - a methodology for finding coordinated effects of microRNA expression changes and genome aberrations in cancer. Bioinformatics. 2009, 26: 161-167. 10.1093/bioinformatics/btp654.PubMedView Article
- Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O'Kelly M, Tamayo P, Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, et al: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 2010, 17: 98-110. 10.1016/j.ccr.2009.12.020.PubMedPubMed CentralView Article
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995, 57: 289-300.
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007, 8: 485-499. 10.1093/biostatistics/kxl042.PubMedView Article
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008.PubMedView Article
- Hautaniemi S, Ringner M, Kauraniemi P, Autio R, Edgren H, Yli-Harja O, Astola J, Kallioniemi A, Kallioniemi OP: A strategy for identifying putative causes of gene expression variation in human cancer. J Jefferson Institute. 2004, 341: 77-88.
- Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringnér M, Sauter G, Monni O, Elkahloun A, Kallioniemi O-P, Kallioniemi A: Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 2002, 62: 6240-6245.PubMed
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008, 36: D154-158. 10.1093/nar/gkm952.PubMedPubMed CentralView Article
- Boutros M, Bras LP, Huber W: Analysis of cell-based RNAi screens. Genome Biol. 2006, 7: R66-10.1186/gb-2006-7-7-r66.PubMedPubMed CentralView Article
- Cleveland W: Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc. 1979, 74: 829-836. 10.2307/2286407.View Article
- Des Rivi J, Wiegand J: Eclipse: a platform for integrating development tools. IBM Syst J. 2004, 43: 371-383. 10.1147/sj.432.0371.View Article
- Anduril. [http://csbi.ltdk.helsinki.fi/anduril/]
- Anduril User Guide. [http://csbi.ltdk.helsinki.fi/anduril/userguide.pdf]
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView Article
- Frank E, Hall M, Trigg L, Holmes G, Witten IH: Data mining in bioinformatics using Weka. Bioinformatics. 2004, 20: 2479-2481. 10.1093/bioinformatics/bth261.PubMedView Article
- Laakso M, Hautaniemi S: Integrative platform to translate gene sets to networks. Bioinformatics. 2010, 26: 1802-1803. 10.1093/bioinformatics/btq277.PubMedView Article
- Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet. 2006, 38: 500-501. 10.1038/ng0506-500.PubMedView Article
- Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty B, Mahurkar A, Wortman J, White O, Angiuoli SV: Ergatis: a web interface and scalable software system for bioinformatics workflows. Bioinformatics. 2010, 26: 1488-1492. 10.1093/bioinformatics/btq167.PubMedPubMed CentralView Article
- Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004, 20: 3045-3054. 10.1093/bioinformatics/bth361.PubMedView Article
- Anduril generated glioblastoma multiforme result website. [http://csbi.ltdk.helsinki.fi/anduril/tcga-gbm/]
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36: D480-484. 10.1093/nar/gkm882.PubMedPubMed CentralView Article
- Wu J, Vallenius T, Ovaska K, Westermarck J, Makela TP, Hautaniemi S: Integrated network analysis platform for protein-protein interactions. Nat Methods. 2009, 6: 75-77. 10.1038/nmeth.1282.PubMedView Article
- Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics. 1998, 14: 656-664. 10.1093/bioinformatics/14.8.656.PubMedView Article
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, et al: Ensembl 2009. Nucleic Acids Res. 2009, 37: D690-697. 10.1093/nar/gkn828.PubMedPubMed CentralView Article
- Chan JA, Krichevsky AM, Kosik KS: MicroRNA-21 is an antiapoptotic factor in human glioblastoma cells. Cancer Res. 2005, 65: 6029-6033. 10.1158/0008-5472.CAN-05-0137.PubMedView Article
- Corsten MF, Miranda R, Kasmieh R, Krichevsky AM, Weissleder R, Shah K: MicroRNA-21 knockdown disrupts glioma growth in vivo and displays synergistic cytotoxicity with neural precursor cell delivered S-TRAIL in human gliomas. Cancer Res. 2007, 67: 8994-9000. 10.1158/0008-5472.CAN-07-1045.PubMedView Article
- Krichevsky AM, Gabriely G: miR-21: a small multi-faceted RNA. J Cell Mol Med. 2009, 13: 39-53. 10.1111/j.1582-4934.2008.00556.x.PubMedPubMed CentralView Article
- Lawler S, Chiocca EA: Emerging functions of microRNAs in glioblastoma. J Neurooncol. 2009, 92: 297-306. 10.1007/s11060-009-9843-2.PubMedView Article
- Abba MC, Sun H, Hawkins KA, Drake JA, Hu Y, Nunez MI, Gaddis S, Shi T, Horvath S, Sahin A, Aldaz CM: Breast cancer molecular signatures as determined by SAGE: correlation with lymph node status. Mol Cancer Res. 2007, 5: 881-890. 10.1158/1541-7786.MCR-07-0055.PubMedPubMed CentralView Article
- Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, Mc Henry KT, Pinchback RM, Ligon AH, Cho YJ, Haery L, Greulich H, Reich M, Winckler W, Lawrence MS, Weir BA, Tanaka KE, Chiang DY, Bass AJ, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S, et al: The landscape of somatic copy-number alteration across human cancers. Nature. 2010, 463: 899-905. 10.1038/nature08822.PubMedPubMed CentralView Article
- Benito R, Gil-Benso R, Quilis V, Perez M, Gregori-Romero M, Roldan P, Gonzalez-Darder J, Cerda-Nicolas M, Lopez-Gines C: Primary glioblastomas with and without EGFR amplification: Relationship to genetic alterations and clinicopathological features. Neuropathology. 2009, 30: 392-400. 10.1111/j.1440-1789.2009.01081.x.PubMedView Article
- Tran H, Hamada F, Schwarz-Romond T, Bienz M: Trabid, a new positive regulator of Wnt-induced transcription with preference for binding and cleaving K63-linked ubiquitin chains. Genes Dev. 2008, 22: 528-542. 10.1101/gad.463208.PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.