Digital transcriptome profiling of normal and glioblastoma-derived neural stem cells identifies genes associated with patient survival

Background Glioblastoma multiforme, the most common type of primary brain tumor in adults, is driven by cells with neural stem (NS) cell characteristics. Using derivation methods developed for NS cells, it is possible to expand tumorigenic stem cells continuously in vitro. Although these glioblastoma-derived neural stem (GNS) cells are highly similar to normal NS cells, they harbor mutations typical of gliomas and initiate authentic tumors following orthotopic xenotransplantation. Here, we analyzed GNS and NS cell transcriptomes to identify gene expression alterations underlying the disease phenotype. Methods Sensitive measurements of gene expression were obtained by high-throughput sequencing of transcript tags (Tag-seq) on adherent GNS cell lines from three glioblastoma cases and two normal NS cell lines. Validation by quantitative real-time PCR was performed on 82 differentially expressed genes across a panel of 16 GNS and 6 NS cell lines. The molecular basis and prognostic relevance of expression differences were investigated by genetic characterization of GNS cells and comparison with public data for 867 glioma biopsies. Results Transcriptome analysis revealed major differences correlated with glioma histological grade, and identified misregulated genes of known significance in glioblastoma as well as novel candidates, including genes associated with other malignancies or glioma-related pathways. This analysis further detected several long non-coding RNAs with expression profiles similar to neighboring genes implicated in cancer. Quantitative PCR validation showed excellent agreement with Tag-seq data (median Pearson r = 0.91) and discerned a gene set robustly distinguishing GNS from NS cells across the 22 lines. These expression alterations include oncogene and tumor suppressor changes not detected by microarray profiling of tumor tissue samples, and facilitated the identification of a GNS expression signature strongly associated with patient survival (P = 1e-6, Cox model). Conclusions These results support the utility of GNS cell cultures as a model system for studying the molecular processes driving glioblastoma and the use of NS cells as reference controls. The association between a GNS expression signature and survival is consistent with the hypothesis that a cancer stem cell component drives tumor growth. We anticipate that analysis of normal and malignant stem cells will be an important complement to large-scale profiling of primary tumors.


Background
Glioblastoma (grade IV astrocytoma) is the most common and severe type of primary brain tumor in adults. The prognosis is poor, with a median survival time of 15 months despite aggressive treatment [1]. Glioblastomas display extensive cellular heterogeneity and contain a population of cells with properties characteristic of neural stem (NS) cells [2]. It has been proposed that such corrupted stem cell populations are responsible for maintaining cancers, and give rise to differentiated progeny that contribute to the cellular diversity apparent in many neoplasias. Data supporting this hypothesis have been obtained for several types of malignancies, including a variety of brain cancers [2]. Importantly, a recent study using a mouse model of glioblastoma demonstrated that tumor recurrence after chemotherapy originates from a malignant cell population with NS cell features [3]. Characterizing human glioblastoma cancer stem cells to understand how they differ from normal tissue stem cell counterparts may therefore provide key insights toward the identification of new therapeutic opportunities.
Fetal and adult NS cells can be isolated and maintained as untransformed adherent cell lines in serum-free medium supplemented with growth factors [4,5]. Using similar protocols, it is possible to expand NS cells from gliomas [6]. These glioma-derived NS (GNS) cells are very similar in morphology to normal NS cells, propagate continuously in culture and share expression of many stem and progenitor cell markers, such as SOX2 and Nestin. Like normal progenitor cells of the central nervous system, they can also differentiate into neurons, astrocytes and oligodendrocytes to varying degrees [5,6]. In contrast to NS cells, however, GNS cells harbor extensive genetic abnormalities characteristic of the disease and form tumors that recapitulate human gliomas when injected into mouse brain regions corresponding to sites of occurrence in patients.
In this study, we compare gene expression patterns of GNS and NS cells to discover transcriptional anomalies that may underlie tumorigenesis. To obtain sensitive and genome-wide measurements of RNA levels, we conducted high-throughput sequencing of transcript tags (Tag-seq) on GNS cell lines from three glioblastoma cases and on two normal NS cell lines, followed by quantitative reverse transcription PCR (qRT-PCR) validation in a large panel of GNS and NS cell lines. Tag-seq is an adaptation of serial analysis of gene expression (SAGE) to high-throughput sequencing and has considerable sensitivity and reproducibility advantages over microarrays [7,8]. Compared to transcriptome shotgun sequencing (RNA-seq), Tag-seq does not reveal full transcript sequences, but has the advantages of being strand-specific and unbiased with respect to transcript length.
A large body of microarray expression data for glioblastoma biopsies has been generated through multiple studies [9][10][11][12][13]. These data have been extensively analyzed to detect gene expression differences among samples, with the aim to identify outliers indicative of aberrant expression [11,14,15], discover associations between gene expression and prognosis [12,16] or classify samples into clinically relevant molecular subtypes [9,10,13,17]. However, expression profiling of tumor specimens is limited by the inherent cellular heterogeneity of malignant tissue and a lack of reference samples with similar compositions of corresponding normal cell types. GNS cells represent a tractable alternative for such analyses, as they constitute a homogeneous and self-renewing cell population that can be studied in a wide range of experimental contexts and contrasted with genetically normal NS cells. By combining the sensitive Tag-seq method with the GNS/NS model system we obtain a highly robust partitioning of malignant and normal cell populations, and identify candidate oncogenes and tumor suppressors not previously associated with glioma.

Materials and methods
Cell culture and sample preparation GNS and NS cells were cultured in N2B27 serum-free medium [18], a 1:1 mixture of DMEM/F-12 and Neurobasal media (Invitrogen, Paisley, UK) augmented with N2 (Stem Cell Sciences, Cambridge, UK) and B27 (Gibco, Paisley, UK) supplements. Self-renewal was supported by the addition of 10 ng/ml epidermal growth factor and 20 ng/ml fibroblast growth factor 2 to the complete medium. Cells were plated at 20,000/cm 2 in laminin-coated vessels (10 μg/ml laminin-1 (Sigma, Dorset, UK) in phosphate-buffered saline for 6 to 12 h), passaged near confluence using Accutase dissociation reagent (Sigma) and were typically split at 1:3 for NS cells and 1:3 to 1:6 for GNS cells. For expression analysis, cells were dissociated with Accutase and RNA was extracted using RNeasy (Qiagen, West Sussex, UK), including a DNase digestion step. RNA quality was assessed on the 2100 Bioanalyzer (Agilent, Berkshire, UK).

Transcriptome tag sequencing
Tag-seq entails the capture of polyadenylated RNA followed by extraction of a 17-nucleotide (nt) sequence immediately downstream of the 3′-most NlaIII site in each transcript. These 17 nt 'tags' are sequenced in a high-throughput manner and the number of occurrences of each unique tag is counted, resulting in digital gene expression profiles where tag counts reflect expression levels of corresponding transcripts [8].
Tag-seq libraries were prepared using the Illumina NlaIII DGE protocol. Briefly, polyadenylated RNA was isolated from 2 µg total RNA using Sera-Mag oligo(dT) beads (Thermo Scientific, Leicestershire, UK). Firststrand cDNA was synthesized with SuperScript II reverse transcriptase (Invitrogen) for 1 h at 42°C, followed by second-strand synthesis by DNA polymerase I for 2.5 h at 16°C in the presence of RNase H. cDNA products were digested with NlaIII for 1 h at 37°C and purified to retain only the 3′-most fragments bound to the oligo(dT) beads. Double-stranded GEX adapter 1 oligonucleotides, containing an MmeI restriction site, were ligated to NlaIII digestion products with T4 DNA ligase for 2 h at 20°C. Ligation products were then digested with MmeI at the adapter-cDNA junction site, thereby creating 17 bp tags free in solution. GEX adapter 2 oligos were ligated to the MmeI cleavage site by T4 DNA ligase for 2 h at 20°C, and the resulting library constructs were PCRamplified for 15 cycles with Phusion DNA polymerase (Finnzymes, Essex, UK).
Libraries were sequenced at Canada's Michael Smith Genome Sciences Centre, Vancouver BC on the Illumina platform. Transcript tags were extracted as the first 17 nt of each sequencing read and raw counts obtained by summing the number of reads for each observed tag. To correct for potential sequencing errors, we used the Recount program [19], setting the Hamming distance parameter to 1. Recount uses an expectation maximization algorithm to estimate true tag counts (that is, counts in the absence of error) based on observed tag counts and base-calling quality scores. Tags matching adapters or primers used in library construction and sequencing were identified and excluded using TagDust [20] with a target false discovery rate (FDR) of 1%. Tags derived from mitochondrial or ribosomal RNA were identified and excluded by running the bowtie short-read aligner [21] against a database consisting of all ribosomal RNA genes from Ensembl [22], all ribosomal repeats in the UCSC Genome Browser RepeatMasker track for genome assembly GRCh37 [23], and the mitochondrial DNA sequence; only perfect matches to the extended 21 nt tag sequence (consisting of the NlaIII site CATG followed by the observed 17 nt tag) were accepted. Remaining tags were assigned to genes using a hierarchical strategy based on the expectation that tags are most likely to originate from the 3'most NlaIII site in known transcripts (Additional files 1 and 2). To this end, expected tag sequences (virtual tags) were extracted from the SAGE Genie database [24] and Ensembl transcript sequences. In addition, bowtie was applied to determine unique, perfect matches for sequenced tags to the reference genome.
The Bioconductor package DESeq [25] was used to normalize tag counts, call differentially expressed genes and obtain variance-stabilized expression values for correlation calculations. Tests for enrichment of Gene Ontology and InterPro terms were performed in R, using Gene Ontology annotation from the core Bioconductor package org.Hs.eg and InterPro annotation from Ensembl. Each term associated with a gene detected by Tag-seq was tested. Signaling pathway impact analysis was carried out using the Bioconductor package SPIA [26]. To identify major differences common to the GNS cell lines investigated, we filtered the set of genes called differentially expressed at 1% FDR, further requiring (i) two-fold or greater change in each GNS cell line compared to each NS cell line, with the direction of change being consistent among them; and (ii) expression above 30 tags per million in each GNS cell line (if upregulated in GNS cells) or each NS cell line (if downregulated in GNS cells). Sequencing data and derived gene expression profiles are available from ArrayExpress [27] under accession E-MTAB-971.

Quantitative RT-PCR validation
Custom-designed TaqMan low-density array microfluidic cards (Applied Biosystems, Paisley, UK) were used to measure the expression of 93 genes in 22 cell lines by qRT-PCR. This gene set comprises 82 validation targets from Tag-seq analysis, eight glioma and developmental markers, and three endogenous control genes (18S ribosomal RNA, TUBB and NDUFB10). The 93 genes were interrogated using 96 different TaqMan assays (three of the validation targets required two different primer and probe sets to cover all known transcript isoforms matching differentially expressed tags). A full assay list with raw and normalized threshold cycle (C t ) values is provided in Additional file 3. To capture biological variability within cell lines, we measured up to four independent RNA samples per line. cDNA was generated using SuperScript III (Invitrogen) and real-time PCR carried out using TaqMan fast universal PCR master mix. C t values were normalized to the average of the three control genes using the Bioconductor package HTqPCR [28]. Differentially expressed genes were identified by the Wilcoxon rank sum test after averaging replicates.

Tumor gene expression analysis
Public microarray data, survival information and other associated metadata were obtained from The Cancer Genome Atlas (TCGA) and four independent studies ( Table 1). All tumor microarray data were from samples obtained upon initial histologic diagnosis. We used processed (level 3) data from TCGA, consisting of one expression value per gene and sample (Additional file 4). For the other data sets, we processed the raw microarray data with the RMA method in the Bioconductor package affy [29] and retrieved probe-gene mappings from Ensembl 68 [22]. For genes represented by multiple probesets, expression values were averaged across probesets for randomization tests, heatmap visualization and GNS signature score calculation. Differential expression was computed using limma [30]. Randomization tests were conducted with the limma function geneSet-Test, comparing log 2 fold-change for core up-or downregulated genes against the distribution of log 2 foldchange for randomly sampled gene sets of the same size.
Survival analysis was carried out with the R library survival. To combine expression values of multiple genes for survival prediction, we took an approach inspired by Colman et al. [16]. The normalized expression values x ij , where i represents the gene and j the sample, were first standardized to be comparable between genes by subtracting the mean across samples and dividing by the standard deviation, thus creating a matrix of z-scores: Using a set U of n U genes upregulated in GNS cell lines and a set D of n D genes downregulated in these cells, we then computed a GNS signature score s j for each sample j by subtracting the mean expression of the downregulated genes from the mean expression of the upregulated genes: IDH1 mutation calls for TCGA samples were obtained from Firehose data run version 2012-07-07 [31] and data files from the study by Verhaak et al. updated 2011-11-28 [32].

Array comparative genomic hybridization
We re-analyzed the array comparative genomic hybridization (CGH) data described by Pollard et al. [6]. CGH was performed with Human Genome CGH Microarray 4x44K arrays (Agilent), using genomic DNA from each cell line hybridized in duplicate (dye swap) and normal human female DNA as reference (Promega, Southampton, UK). Log 2 ratios were computed from processed Cy3 and Cy5 intensities reported by the software CGH Analytics (Agilent). We corrected for effects related to GC content and restriction fragment size using a modified version of the waves array CGH correction algorithm [33]. Briefly, log 2 ratios were adjusted by sequential loess normalization on three factors: fragment GC content, fragment size, and probe GC content. These were selected after investigating dependence of log ratio on multiple factors, including GC content in windows of up to 500 kb centered around each probe. The Bioconductor package CGHnormaliter [34] was then used to correct for intensity dependence and log 2 ratios scaled to be comparable between arrays using the 'scale' method in the package limma [35]. Replicate arrays were averaged and the genome (GRCh37) segmented into regions with different copy number using the circular binary segmentation algorithm in the Bioconductor package DNAcopy [36], with the option undo.SD set to 1. Aberrations were called using the package CGHcall [37] with the option nclass set to 4. CGH data are available from ArrayExpress [27] under accession E-MTAB-972.

Transcriptome analysis highlights pathways affected in glioma
We applied Tag-seq to four GNS cell lines (G144, G144ED, G166 and G179) and two human fetal NS cell lines (CB541 and CB660), all previously described [5,6]. G144 and G144ED were independently established from the same parental tumor in different laboratories. Tag-seq gene expression values were strongly correlated between these two lines (Pearson r = 0.94), demonstrating that the experimental procedure, including cell line establishment, library construction and sequencing, is highly reproducible. The two NS cell transcriptome profiles were also well correlated (r = 0.87), but there were greater differences among G144, G166 and G179 (r ranging from 0.78 to 0.82). This is expected, as G144, G166 and G179 originate from different and histologically distinct glioblastoma cases.
We used the Tag-seq data to identify differences in gene expression between the three GNS cell lines G144, G166 and G179 and the two normal NS cell lines CB541 and CB660. At a FDR of 10%, this analysis revealed 485 genes to be expressed at a higher average level in GNS cells (upregulated) and 254 genes to be downregulated (Additional file 5). GNS cells display transcriptional alterations common in glioblastoma, including upregulation of the epidermal growth factor receptor (EGFR) gene and downregulation of the tumor suppressor PTEN [11]. Enrichment analysis using Gene Ontology and the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database confirmed the set of 739 differentially expressed genes to be enriched for pathways related to brain development, glioma and cancer (Tables 2 and 3). We also observed enrichment of regulatory and inflammatory genes, such as signal transduction components, cytokines, growth factors and DNA-binding factors. Several genes related to antigen presentation on MHC class I and II molecules were upregulated in GNS cells, consistent with the documented expression of their corresponding proteins in glioma tumors and cell lines [38,39]. In addition, we detected 25 differentially expressed long non-coding RNAs (Additional file 6). Several of these display an expression pattern similar to a neighboring protein-coding gene, including cancer-associated genes DKK1 and CTSC [40,41] ( Figure 1) and developmental regulators IRX2, SIX3 and ZNF536 [42], suggesting that they may be functional RNAs regulating nearby genes [43] or represent transcription from active enhancers [44].
To visualize gene expression differences in a pathway context, we compiled an integrated pathway map that includes the pathways most commonly affected in glioblastoma, as well as pathways related to antigen processing and presentation, apoptosis, angiogenesis and invasion (Additional file 1). The map contains 182 genes, of which 66 were differentially expressed between GNS and NS cells at 10% FDR (Additional file 7). Figure 2 depicts a condensed version focused on the pathways most frequently affected in glioblastoma. This approach allowed us to identify differentially expressed genes that participate in glioma-related pathways, but have not been directly implicated in glioma. These include several genes  associated with other neoplasms (Table 4). Our comparison between GNS and NS cells thus highlights genes and pathways that are known to be affected in glioma as well as novel candidates, and suggests the GNS/NS comparison is a compelling model for investigating the molecular attributes of glioma.

Core expression changes in GNS lines are mirrored in glioma tumors and correlate with histological grade
To capture major gene expression changes common to G144, G166 and G179, we set strict criteria on fold changes and tag counts (see Materials and methods). This approach yielded 32 upregulated and 60 downregulated genes, in the following referred to as 'core' differentially expressed genes (Additional file 8). This set includes genes with established roles in glioblastoma (for example, PTEN [11] and CEBPB [45]), as well as others not previously implicated in the disease (see Discussion).
To investigate whether these core differentially expressed genes have similar expression patterns in GNS cells and primary tumors, we made use of public microarray data (Table 1). Perfect agreement between tissue-and cellbased results would not be expected, as tissues comprise a heterogeneous mixture of cell types. Nevertheless, analysis of microarray expression data from TCGA [11,46] for 397 glioblastoma cases (Additional file 4) revealed a clear trend for core upregulated GNS genes to be more highly expressed in glioblastoma tumors than in nonneoplastic brain tissue (P = 0.02, randomization test; Figure 3a) and an opposite trend for core downregulated genes (P = 3 × 10 -5 ; Figure 3c). We hypothesized that the expression of these genes might also differ between glioblastoma and less severe astrocytomas. We therefore examined their expression patterns in microarray data from the studies of Phillips et al. [9] and Freije et al. [10], which both profiled grade III astrocytoma cases in addition to glioblastomas ( Table 1). The result was similar to the comparison with non-neoplastic brain tissue above; there was a propensity for core upregulated genes to be more highly expressed in glioblastoma than in the lower-grade tumor class (P = 10 -6 ; Figure  3b), while core downregulated genes showed the opposite pattern (P = 10 -4 ; Figure 3d). The set of core differentially expressed genes identified by Tag-seq thus defines an expression signature characteristic of glioblastoma and related to astrocytoma histological grade.

Large-scale qRT-PCR validates Tag-seq results and identifies a robust gene set distinguishing GNS from NS cells
To assess the accuracy of Tag-seq expression level estimates and investigate gene activity in a larger panel of cell lines, we assayed 82 core differentially expressed genes in 16 GNS cell lines (derived from independent patient tumors) and six normal NS cell lines by qRT-PCR using custom-designed TaqMan microfluidic arrays. The 82 validation targets (Additional file 3) were selected from the 92 core differentially expressed genes based on the availability of TaqMan probes and considering prior knowledge of gene functions. For the cell lines assayed by both Tag-seq and qRT-PCR, measurements agreed remarkably well between the two technologies: the median Pearson correlation for expression profiles of individual genes was 0.91 and the differential expression calls were corroborated for all 82 genes (Figure 4a). Across the entire panel of cell lines, 29 of the 82 genes showed statistically significant differences between GNS and NS cells at an FDR of 5% (Figure 4b,c). This set of 29 genes generally distinguishes GNS cells from normal NS cell counterparts, and may therefore have broad relevance for elucidating properties specific to tumor-initiating cells.

A GNS cell expression signature is associated with patient survival
To further explore the relevance in glioma for these recurrent differences between GNS and NS cell transcriptomes, we integrated clinical information with tumor expression data. We first tested for associations between gene expression and survival time using the TCGA data set consisting of 397 glioblastoma cases (Table 1). For each gene, we fitted a Cox proportional hazards model with gene expression as a continuous explanatory variable and computed a P-value by the score test (  lines assayed by qRT-PCR was enriched for low P-values compared to the complete set of 18,632 genes quantified in the TCGA data set (P = 0.02, one-sided Kolmogorov-Smirnov test), demonstrating that expression analysis of GNS and NS cell lines had enriched for genes associated with patient survival. Seven of the 29 genes had a P-value below 0.05 and, for six of these, the direction of the survival trend was concordant with the expression in GNS cells, such that greater similarity to the GNS cell expression pattern indicated poor survival. Specifically, DDIT3, HOXD10, PDE1C and PLS3 were upregulated in GNS cells and expressed at higher levels in glioblastomas with poor prognosis, while PTEN and TUSC3 were downregulated in GNS cells and expressed at lower levels in gliomas with poor prognosis.
We reasoned that, if a cancer stem cell subpopulation in glioblastoma tumors underlies these survival trends, it may be possible to obtain a stronger and more robust association with survival by integrating expression information for multiple genes up-or downregulated in GNS cells. We therefore combined the expression values for the genes identified above (DDIT3, HOXD10, PDE1C, PLS3, PTEN and TUSC3) into a single value per tumor sample, termed 'GNS signature score' (see Materials and methods). This score was more strongly associated with survival (P = 10 -6 ) than were the expression levels of any of the six individual genes (P ranging from 0.005 to 0.04; Table 5).
To test whether these findings generalize to independent clinical sample groups, we examined the glioblastoma data sets described by Gravendeel et al. [13] and Murat et al. [12], consisting of 141 and 70 cases, respectively ( Table 1). The GNS signature score was correlated with patient survival in both of these data sets (P = 3 × 10 -5 and 0.006, respectively; Figure 5a; Additional file 9). At the level of individual GNS signature genes, five were significantly associated with survival (P < 0.05) in both of the two largest glioblastoma data sets we investigated (TCGA and Gravendeel): HOXD10, PDE1C, PLS3, PTEN and TUSC3 (Table 5). In addition to glioblastoma (grade IV) tumors, Gravendeel et al. also characterized 109 grade I to III glioma cases ( Table 1). Inclusion of these data in survival analyses made the association with the GNS signature even more apparent (Figure 5b). This is consistent with the above observation that core transcriptional alterations in GNS cells correlate with histological grade of primary tumors. Analysis of data from the studies of Phillips et al. [9] and Freije et al. [10], which profiled both grade III and IV gliomas (Table 1), further confirmed the correlation between GNS signature and survival (Figure 5b). In summary, the association between GNS signature and patient survival was reproducible in five independent data sets comprising 867 glioma cases in total (Table 1).
We controlled for a range of potential confounding factors; these did not explain the survival trends observed (Additional file 10). Investigating a relationship to known predictors of survival in glioma, we noted that the GNS signature score correlates with patient age at diagnosis, suggesting that the GNS cell-related expression changes are associated with the more severe form of the disease observed in older patients (Figure 6a). Of the genes contributing to the GNS signature, HOXD10, PLS3, PTEN and TUSC3 correlated with age both in the TCGA and Gravendeel data sets (Additional file 11).
Most grade III astrocytomas and a minority of glioblastomas carry a mutation affecting codon 132 of the IDH1 gene resulting in an amino acid change (R132H, R132S, R132C, R132G, or R132L). The presence of this mutation is associated with lower age at disease onset and better prognosis [47,48]. All 16 GNS cell lines profiled in this study were derived from glioblastoma tumors, and the IDH1 locus was sequenced in each cell line (data not shown); none of them harbor the mutation. We therefore investigated whether the GNS signature is characteristic of IDH1 wild-type glioblastomas. IDH1 status has been determined for most cases in the TCGA and Gravendeel data  . Bars depict average fold-change between glioblastoma and non-neoplastic brain tissue (a,c) (TCGA data set) and between glioblastoma and grade III astrocytoma (b,d) (Phillips and Freije data sets combined). Black bars indicate genes with significant differential expression in the microarray data (P < 0.01). Heatmaps show expression in individual samples relative to the average in nonneoplastic brain (a,c) or grade III astrocytoma (b,d). One gene (CHCHD10) not quantified in the TCGA data set is omitted from (a). sets (Table 6) [11,13,17]. As expected, we found that gliomas with the IDH1 mutation tend to have lower GNS signature scores than IDH1 wild-type gliomas of the same histological grade (Figure 6b). However, we also found the GNS signature to have a stronger survival association than IDH1 status ( Table 6). The signature remained a significant predictor of patient survival when controlling for IDH1 status ( Table 6), demonstrating that it contributes independent information to the survival model and does not simply represent a transcriptional state of IDH1 wildtype tumors. This was evident in glioblastomas as well as grade I to III gliomas; the effect is thus not limited to grade IV tumors.
To investigate whether the correlation between GNS signature and age could be explained by the higher proportion of cases with IDH1 mutation among younger patients, we repeated the correlation analysis described above (Figure 6a), limiting the data to glioblastoma cases without IDH1 mutation. For the TCGA data set, the correlation was decreased somewhat (Pearson r = 0.25 compared to 0.36 for the full data set) but still highly significant (P = 6 × 10 -5 ), demonstrating that the correlation with age is only partially explained by IDH1 status. This result was confirmed in the Gravendeel data set, where the effect of controlling for IDH1 status and grade was negligible (r = 0.38 compared to 0.39 for the full data set including grade I to III samples). Among the individual signature genes, both HOXD10 and TUSC3 remained correlated with age in both data sets when limiting the analysis to IDH1 wild-type glioblastoma cases (Additional file 11).

Influence of copy number alterations on the GNS transcriptome
Previous analysis of chromosomal aberrations in G144, G166 and G179 by spectral karyotyping and array CGH detected genetic variants characteristic of glioblastoma [6]. To assess the influence of copy-number changes on the GNS transcriptome, we compared CGH profiles (Figure 7) with Tag-seq data. On a global level, there was an apparent correlation between chromosomal aberrations and gene expression levels (Figure 8a,b), demonstrating that copy-number changes are a significant cause of the observed expression differences. Among the 29 genes differentially expressed between GNS and NS cells in the larger panel assayed by qRT-PCR, there was a tendency for downregulated genes to be lost: 10 out of 15 downregulated genes were in regions of lower than average copy number in one or more GNS cell lines, compared to 4 out of 14 upregulated genes (P = 0.046, one-sided Fisher's exact test).
Despite the global correlation between gene expression and copy number, many individual expression changes could not be explained by structural alterations. For example, only a minority of upregulated genes (21%) were located in regions of increased copy number, including whole-chromosome gains (Figure 8b), the survival-associated genes HOXD10, PLS3, and TUSC3 lacked copynumber aberrations consistent with their expression changes, and the survival-associated gene DDIT3 was only genetically gained in G144, although highly expressed in all three GNS cell lines (Figure 8c). In general, the 29 genes that robustly distinguish GNS from NS cells did not show a consistent pattern of aberrations: only three genes (PDE1C, NDN and SYNM) were located in regions similarly affected by genetic lesions in all lines. Thus, in addition to copynumber alterations, other factors are important in shaping the GNS transcriptome, and regulatory mechanisms may differ among GNS cell lines yet produce similar changes in gene expression.   Figure 5 Association between GNS signature score and patient survival. (a,b) Kaplan-Meier plots illustrate the association between signature score and survival for three independent glioblastoma data sets (a) and three data sets that include gliomas of lower grade (b) ( Table 1). Higher scores indicate greater similarity to the GNS cell expression profile. Hazard ratios and log-rank P-values were computed by fitting a Cox proportional hazards model to the data. Percentile thresholds were chosen for illustration; the association with survival is statistically significant across a wide range of thresholds (Additional file 9) and the P-values given in the text and Table 6 were computed without thresholding, using the score as a continuous variable.

Discussion
To reveal transcriptional changes that underlie glioblastoma, we performed an in-depth analysis of gene expression in malignant stem cells derived from patient tumors in relation to untransformed, karyotypically normal NS cells. These cell types are closely related and it has been hypothesized that gliomas arise by mutations in NS cells or in glial cells that have reacquired stem cell features [2]. We measured gene expression by high-throughput RNA tag sequencing (Tag-seq), a method that features high sensitivity and reproducibility compared to microarrays [7]. qRT-PCR validation further demonstrates that Tag-seq expression values are highly accurate. Other cancer samples and cell lines have recently been profiled with the same method [8,47] and it should be feasible to directly compare those results to the data presented here. Through Tag-seq expression profiling of normal and cancer stem cells followed by qRT-PCR validation in a wider panel of 22 cell lines, we identified 29 genes strongly discriminating GNS from NS cells. Some of these genes have previously been implicated in glioma, including four with a role in adhesion and/or migration, CD9, ST6GALNAC5, SYNM and TES [49][50][51][52], and two transcriptional regulators, FOXG1 and CEBPB. FOXG1, which has been proposed to act as an oncogene in glioblastoma by suppressing growth-inhibitory effects of transforming growth factor β [53], showed remarkably strong expression in all 16 GNS cell lines assayed by qRT-PCR. CEBPB was recently identified as a master regulator of a mesenchymal gene expression signature associated with poor prognosis in glioblastoma [45]. Studies in hepatoma and pheochromocytoma cell lines have shown that the transcription factor encoded by CEBPB (C/EBPβ) promotes expression of DDIT3 [54], another transcriptional regulator that we found to be upregulated in GNS cells. DDIT3 encodes the protein CHOP, which   Figure 7 CGH profiles for GNS lines. Dots indicate log 2 ratios for array CGH probes along the genome, comparing each GNS cell line to normal female DNA. Colored segments indicate gain (red) and loss (green) calls, with color intensity proportional to mean log 2 ratio over the segment. Aberrations known to be common in glioblastoma [11,79] were identified, including gain of chromosome 7 and losses of large parts of chromosomes 10, 13, 14 and 19 in more than one GNS cell line, as well as focal gain of CDK4 in G144 (arrow, chromosome 12) and focal loss of the CDKN2A-CDKN2B locus in G179 (arrow, chromosome 9). The X chromosome was called as lost in G144 and G179 because these two cell lines are from male patients; sex-linked genes were excluded from downstream analyses of aberration calls. in turn can inhibit C/EBPβ by dimerizing with it and acting as a dominant negative [54]. This interplay between CEBPB and DDIT3 may be relevant for glioma therapy development, as DDIT3 induction in response to a range of compounds sensitizes glioma cells to apoptosis (see, for example, [55]).
Our results also corroborate a role in glioma for several other genes with limited prior links to the disease. This list includes PLA2G4A, HMGA2, TAGLN and TUSC3, all of which have been implicated in other neoplasias (Additional file 12). PLA2G4A encodes a phospholipase that functions in the production of lipid signaling molecules with mitogenic and pro-inflammatory effects. In a subcutaneous xenograft model of glioblastoma, expression of PLA2G4A by the host mice was required for tumor growth [56]. For HMGA2, a transcriptional regulator downregulated in most GNS cell lines, low or absent protein expression has been observed in glioblastoma compared to lowgrade gliomas [57], and HMGA2 polymorphisms have been associated with survival time in glioblastoma [58]. The set of 29 genes found to generally distinguish GNS from NS cells also includes multiple genes implicated in other neoplasias, but without direct links to glioma (Additional file 12). Of these, the transcriptional regulator LMO4, may be of particular interest, as it is well studied as an oncogene in breast cancer and regulated through the phosphoinositide 3-kinase pathway [59], which is commonly affected in glioblastoma [11].
Five of these 29 genes have not been directly implicated in cancer. This list comprises one gene downregulated in GNS cells (PLCH1) and four upregulated (ADD2, LYST, PDE1C and PRSS12). PLCH1 is involved in phosphoinositol signaling [60], like the frequently mutated phosphoinositide 3-kinase complex [11]. ADD2 encodes a cytoskeletal protein that interacts with FYN, a tyrosine kinase promoting cancer cell migration [61,62]. For PDE1C, a cyclic nucleotide phosphodiesterase gene, we found higher expression to correlate with shorter survival after surgery. Upregulation of PDE1C has been associated with proliferation in other cell types through hydrolysis of cAMP and cGMP [63,64]. PRSS12 encodes a protease that can activate tissue plasminogen activator (tPA) [65], an enzyme that is highly expressed by glioma cells and has been suggested to promote invasion [66].  7 8 8 10 11 11 12 12 15 15 16 20 1 1 1 2 2 4 7 12 12 14 16 20   By considering expression changes in a pathway context, we identified additional candidate glioblastoma genes, such as the putative cell adhesion gene ITGBL1 [67], the orphan nuclear receptor NR0B1, which is strongly upregulated in G179 and is known to be upregulated and mediate tumor growth in Ewing's sarcoma [68], and the genes PARP3 and PARP12, which belong to the poly(ADP-ribose) polymerase (PARP) family of ADP-ribosyl transferase genes involved in DNA repair ( Table 4). The upregulation of these PARP genes in GNS cells may have therapeutic relevance, as inhibitors of their homolog PARP1 are in clinical trials for brain tumors [69].
Transcriptome analysis thus identified multiple genes of known significance in glioma pathology as well as several novel candidate genes and pathways. These results are further corroborated by survival analysis, which revealed a GNS expression signature associated with patient survival time in five independent data sets. This finding is compatible with the notion that gliomas contain a GNS component of relevance for prognosis. Five individual GNS signature genes were significantly associated with survival of glioblastoma patients in both of the two largest data sets: PLS3, HOXD10, TUSC3, PDE1C and the well-studied tumor suppressor PTEN. PLS3 (T-plastin) regulates actin organization and its overexpression in the CV-1 cell line resulted in partial loss of adherence [70]. Elevated PLS3 expression in GNS cells may thus be relevant for the invasive phenotype. The association between transcriptional upregulation of HOXD10 and poor survival is surprising, because HOXD10 protein levels are suppressed by a microRNA (miR-10b) highly expressed in gliomas, and it has been suggested that HOXD10 suppression by miR-10b promotes invasion [71]. Notably, the HOXD10 mRNA upregulation we observe in GNS cells also occurs in glioblastoma tumors, as shown by comparison with grade III astrocytoma (Figure 3b). Similarly, miR-10b is present at higher levels in glioblastoma compared to gliomas of lower grade [71]. It is conceivable that HOXD10 transcriptional upregulation and post-transcriptional suppression is indicative of a regulatory program associated with poor prognosis in glioma.
Tumors from older patients featured an expression pattern more similar to the GNS signature. One of the genes contributing to this trend, TUSC3, is known to be silenced by promoter methylation in glioblastoma, particularly in patients aged over 40 years [72]. Loss or downregulation of TUSC3 has been found in other cancers, such as of the colon, where its promoter becomes increasingly methylated with age in the healthy mucosa [73]. Taken together, these data suggest that transcriptional changes in healthy aging tissue, such as TUSC3 silencing, may contribute to the more severe form of glioma in older patients. Thus, the molecular mechanisms underlying the expression changes described here are likely to be complex and varied. To capture these effects and elucidate their causes, transcriptome analysis of cancer samples will benefit from integration of diverse genomic data, including structural and nucleotide-level genetic alterations, as well as DNA methylation and other chromatin modifications.
To identify expression alterations common to most glioblastoma cases, other studies have profiled tumor resections in relation to non-neoplastic brain tissue [47,74,75]. While such comparisons have been revealing, their power is constrained by discrepancies between reference and tumor samples -for instance, the higher neuronal content of normal brain tissue compared to tumors. Gene expression profiling of tumor tissue further suffers from mixed signal due to a stromal cell component and heterogeneous populations of cancer cells, only some of which contribute to tumor progression and maintenance [2]. Part of a recent study bearing a closer relationship to our analysis examined gene expression in another panel of glioma-derived and normal NS cells [76], but included neurosphere cultures, which often contain a heterogeneous mixture of selfrenewing and differentiating cells.
Here, we have circumvented these issues by profiling uniform cultures of primary malignant stem cell lines that can reconstitute the tumor in vivo [6], in direct comparison to normal counterparts of the same fundamental cell type [4,5]. While the resulting expression patterns largely agree with those obtained from glioblastoma tissues, there are notable differences. For example, we found the breast cancer oncogene LMO4 (discussed above) to be upregulated in most GNS cell lines, although its average expression in glioblastoma tumors is low relative to normal brain tissue (Figure 3a). Similarly, TAGLN and TES were absent or low in most GNS cell lines, but displayed the opposite trend in glioblastoma tissue compared to normal brain (Figure 3c) or grade III astrocytoma (Figure 3d). Importantly, both TAGLN and TES have been characterized as tumor suppressors in malignancies outside the brain and the latter is often silenced by promoter hypermethylation in glioblastoma [77,78].

Conclusions
Our results support the use of GNS cells as a relevant model for investigating the molecular basis of glioblastoma, and the use of NS cell lines as controls in this setting. Transcriptome sequencing revealed aberrant gene expression patterns in GNS cells and defined a molecular signature of the proliferating cell population that drives malignant brain cancers. These transcriptional alterations correlate with several prognostic indicators and are strongly associated with patient survival in both glioblastoma and lower-grade gliomas, suggesting that a greater GNS cell component contributes to poorer prognosis. Several genes observed to be consistently altered in GNS cells have not previously been implicated in glioma, but are known to play a role in other neoplasias or in cellular processes related to malignancy. Such alterations include changes in oncogene and tumor suppressor expression not detectable by microarray profiling of postsurgical glioma biopsies. These findings demonstrate the utility of cancer stem cell models for advancing the molecular understanding of tumorigenesis.

Additional material
Additional file 1: Supplemental methods. Detailed method descriptions for (1) assignment of tags to genes, (2) differential expression analysis of Tag-seq data, and (3) construction of the integrated glioma pathway map. Format: PDF.
Additional file 2: Classification of sequenced tags. Table listing the number of sequenced tags in each sample and the proportion of tags assigned to different categories by the Tag-seq data processing pipeline. Format: XLS.
Additional file 3: qRT-PCR data. Additional file 5: Differentially expressed genes at 10% FDR. Table  with expression values, fold-changes and P-values for the genes found to be expressed at a higher or lower average level in GNS cells compared to NS cells by Tag-seq (10% FDR). Format: XLS.
Additional file 6: Differentially expressed non-coding RNAs. Table of non-coding RNAs found to be differentially expressed between GNS and NS cells by Tag-seq. Format: XLS. Additional file 7: Integrated pathway map. Network diagram of the integrated glioma pathway, with differentially expressed genes colored according to fold-change between GNS and NS cells. Format: PDF.
Additional file 8: Core differentially expressed genes. Table of genes with large expression changes common to the GNS cell lines G144, G166 and G179, relative to the normal NS cell lines CB541 and CB660. Format: XLS.
Additional file 9: Kaplan-Meier plots for multiple GNS signature score thresholds. Survival curves illustrating the association between GNS signature and patient survival for three independent glioblastoma data sets and a range of percentile thresholds on GNS signature score. Format: PDF.
Additional file 10: Controls for survival tests on GNS expression signature. Text and table detailing how confounding factors were controlled for when testing for an association between the GNS expression signature and patient survival. Format: XLS.
Additional file 11: Correlation between age at diagnosis and GNS signature gene expression. Scatter plots demonstrating the correlation between age at diagnosis and expression of GNS signature genes. Format: PDF.
Additional file 12: Disease association of GNS cell-specific genes. Literature survey for the set of 29 genes found to distinguish GNS from NS cells across a panel of 22 different lines. The table details whether each gene has previously been implicated in glioma or other neoplasias, and includes references to relevant publications. Format: XLS.