Post-mortem molecular profiling of three psychiatric disorders

Background Psychiatric disorders are multigenic diseases with complex etiology that contribute significantly to human morbidity and mortality. Although clinically distinct, several disorders share many symptoms, suggesting common underlying molecular changes exist that may implicate important regulators of pathogenesis and provide new therapeutic targets. Methods We performed RNA sequencing on tissue from the anterior cingulate cortex, dorsolateral prefrontal cortex, and nucleus accumbens from three groups of 24 patients each diagnosed with schizophrenia, bipolar disorder, or major depressive disorder, and from 24 control subjects. We identified differentially expressed genes and validated the results in an independent cohort. Anterior cingulate cortex samples were also subjected to metabolomic analysis. ChIP-seq data were used to characterize binding of the transcription factor EGR1. Results We compared molecular signatures across the three brain regions and disorders in the transcriptomes of post-mortem human brain samples. The most significant disease-related differences were in the anterior cingulate cortex of schizophrenia samples compared to controls. Transcriptional changes were assessed in an independent cohort, revealing the transcription factor EGR1 as significantly down-regulated in both cohorts and as a potential regulator of broader transcription changes observed in schizophrenia patients. Additionally, broad down-regulation of genes specific to neurons and concordant up-regulation of genes specific to astrocytes was observed in schizophrenia and bipolar disorder patients relative to controls. Metabolomic profiling identified disruption of GABA levels in schizophrenia patients. Conclusions We provide a comprehensive post-mortem transcriptome profile of three psychiatric disorders across three brain regions. We highlight a high-confidence set of independently validated genes differentially expressed between schizophrenia and control patients in the anterior cingulate cortex and integrate transcriptional changes with untargeted metabolite profiling. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0458-5) contains supplementary material, which is available to authorized users.


Background
Schizophrenia (SZ), bipolar disorder (BPD), and major depressive disorder (MDD) are multigenic diseases with complex etiology and are large sources of morbidity and mortality in the population. All three disorders are associated with high rates of suicide, with~90% of the~41,000 people who commit suicide each year in the US having a diagnosable psychiatric disorder [1]. Notably, while clinically distinct, these disorders also share many symptoms, including psychosis, suicidal ideation, sleep disturbances, and cognitive deficits [2][3][4]. This phenotypic overlap suggests potential common genetic etiology, which is supported by recent large-scale genome-wide association studies (GWAS) [5][6][7][8]. However, this overlap has not been fully characterized with functional genomic approaches. Current therapies for these psychiatric disorders are ineffective in many patients and often only treat a subset of an individual patient's symptoms [9]. Approaches targeting the underlying molecular pathologies within and across these types of disorders are necessary to address the immense burden of psychiatric disease around the world and improve care for the millions of people diagnosed with these conditions. Previous studies [10][11][12][13][14] analyzed brain tissue with RNA sequencing (RNA-seq) in SZ and BPD, and identified altered expression of GABA-related genes in the superior temporal gyrus and hippocampus, as well as differentially expressed genes related to neuroplasticity and mammalian circadian rhythms. Our study focused on the anterior cingulate cortex (AnCg), dorsolateral prefrontal cortex (DLPFC), and nucleus accumbens (nAcc) regions, which are often associated with mood alterations, cognition, impulse control, motivation, reward, and pleasure-all behaviors known to be altered in psychiatric disorders [15,16]. To assess gene expression changes associated with psychiatric disease in these three brain regions, we performed RNA-seq on macro-dissected postmortem tissues in four well-documented cohorts of 24 patients each with SZ, BPD, and MDD and 24 controls (CTL) (96 individuals total). Additionally, we conducted metabolomic profiling of AnCg tissue from the same subjects. RNA-seq analysis revealed common expression profiles in SZ and BPD patients, supporting the notion that these disorders share a common molecular signature. Transcriptional changes were most pronounced in the AnCg, with SZ and BPD exhibiting strongly correlated differences from CTL samples. Differentially expressed genes were associated with cell-type composition, with BPD and SZ samples showing decreased expression of neuronspecific genes. We validated this result with RNA-seq data from an independent cohort of 35 cases each of SZ and BPD and CTL post-mortem cingulate cortex samples from the Stanley Neuropathology Consortium Integrative Database (SNCID; http://sncid.stanleyresearch.org) Array Collection. We present a set of validated genes differentially expressed between SZ and CTL patients, perform an integrated analysis of metabolic pathway disruptions, and highlight a role for the transcription factor EGR1, whose down-regulation in SZ patients may drive a large portion of observed transcription changes.

Methods
See Additional file 1: Supplemental methods for additional detail.

Patient sample collection and preparation
Sample collection, including human subject recruitment and characterization, tissue dissection, and RNA extraction, was described previously [17,18] as part of the Brain Donor Program at the University of California, Irvine, Department of Psychiatry and Human Behavior (Pritzker Neuropsychiatric Disorders Research Consortium) under institutional review board approval. In brief, coronal slices of the brain were rapidly frozen on aluminum plates that were previously frozen to −120°C and dissected as described previously [19]. All samples were diagnosed by psychological autopsy, which included collection and analyses of medical and psychiatric records, toxicology, medical examiners' reports, and 141-item family interviews. Agonal state scores were assigned based on a previously published scale [20]. Controls were selected based on absence of severe psychiatric disturbance and mental illness within first-degree relatives.
We obtained fastq files from RNA-seq experiments for our validation cohort from the SNCID (http://sncid.stan leyresearch.org) Array Collection comprising 35 cases each of SZ, BPD, and CTL of post-mortem cingulate cortex with permission on 30 June 2015. For our analysis, we included the 27 SZ, 26 CTL, and 25 BPD SNCID samples that were successfully downloaded and represented unique samples. SNCID RNA-seq methodology and data processing are described in detail in a previous publication that makes use of the data [10].

RNA-seq and data processing
To extract nucleic acid, 20 mg of post-mortem brain tissue was homogenized in Qiagen RLT buffer + 1% BME using an MP FastPrep-24 and Lysing Matrix D beads for three rounds of 45 s at 6.5 m/s (FastPrep homogenizer, lysing matrix D, MP Bio). Total RNA was isolated from 350 μL of tissue homogenate using the Norgen Animal Tissue RNA Purification Kit (Norgen Biotek Corporation). We made RNA-seq libraries from 250 ng total RNA using poly(A) selection (Dynabeads mRNA DIRECT kit, Life Technologies) and transposase-based nonstranded library construction (Tn-RNA-seq) as described previously [21]. To mitigate potentially confounding batch affects in sample preparation we randomly assigned samples from all brain regions and disorders into batches of 24 samples. We used KAPA to quantify the library concentrations and pooled four samples in order to achieve equal concentration of the four libraries in each lane. Pools were determined by random from the 291 samples. Samples were also randomly selected for pooling in an effort to limit potentially confounding sequencing batch effects. The pooled libraries were sequenced on an Illumina HiSeq 2000 sequencing machine using paired-end 50-bp reads and a 6-bp index read, resulting in an average of 48.2 million reads per library. To quantify the expression of each gene in both Pritzker and SNCID datasets, RNAseq reads were processed with aRNApipe v1.1 using default settings [22]. Briefly, reads were aligned and counted with STAR v2.4.2a to all genes annotated in GRCh37_E75 [23]. All alignment quality metrics were obtained from the picard tools module (http://broadinstitute.github.io/picard/) available in aRNApipe. Genes expressed from the X and Y chromosomes were omitted from the study.
Quantitative PCR (qPCR) was performed on ten SZ and ten CTL patients to validate EGR1 RNA-seq measurements. RNA was extracted as described above from tissue lysates a second time. Reverse transcription was performed on 250 ng of input RNA with the Applied Biosystems high capacity cDNA reverse transcription kit. Validated Taqman assays for EGR1 (Hs00152928_m1) and the housekeeper genes GAPDH (Hs02758991_g1) and ACTB (Hs01060665_g1) were used for qPCR. cDNA was diluted by a factor of 10 before use as input for the Taqman assay. The qPCR reaction was performed on an Applied Biosystems Quant Studio 6 Flex system using the recommended amplification protocol for Taqman assays.

Sequencing data analysis
All data analysis in R was performed with version 3.1.2.

Differential expression analysis and normalization
To examine gene expression changes, we employed the R package DESeq2 [24] (version 1.6.3), using default settings, but employing likelihood ratio test (LRT) hypothesis testing, and removing non-convergent genes from subsequent analysis. Genes differentially expressed between each disorder and CTL samples, by brain region, were identified with DESeq2 (adjusted p value <0.05), including age, brain pH, post-mortem interval (PMI), and percentage of reads uniquely aligned (PRUA) as covariates (full model,~Age + PMI + pH + PRUA + Disorder; reduced model,~Age + PMI + pH + PRUA). For downstream heatmap visualization, principal component analysis (PCA), and cell-type analysis, genes underwent a log-like normalization using DESeq2's varianceStabil-izingTransformation function and were corrected for PRUA by computing residuals to a linear model regressing PRUA on normalized gene expression level with the R lm function unless otherwise specified. DESeq2's default independent filtering method was used to remove genes with an insufficient expression level from further analysis.

PCA and hierarchical clustering
PCA analysis was performed in R on normalized data using the prcomp() command. Hierarchical clustering of normalized gene expression data was done in R with the hclust command (method = "ward", distance = "Euclidean")

Pathway enrichment analysis
Pathway analysis was conducted using the web-based tool LRPath [25] using all gene ontology (GO) term annotations, adjusting to gene read count with RNA-Enrich, including directionality and limiting maximum GO term size to 500 genes. GO term visualization was performed using the Cytoscape Enrichment Map plug-in [26]. The Genesetfile (.gmt) GO annotations from 1 February 2017 were downloaded from http://download.ba derlab.org/EM_Genesets/. The LRPath output was parsed and used as an enrichment file with all upregulated pathways colored red and all downregulated pathways colored blue, regardless of degree of upregulation. Mapping parameters were set as p value cutoff = 0.005, false discovery rate (FDR) cutoff = 0.1, and Jaccard coefficient >0.3. Resulting networks were exported as PDFs. Summary terms were added to the plot based on the GO terms in those clusters. In order to assess overlap between significant GO terms in our analysis and the GWAS described by the Psychiatric Genomics Consortium [5], we downloaded the p values reported for SZ hits from their Supplemental Table 4, which contained 424 significant GO terms. We used a chisquared test to assess significant overlap between the two groups. We report the p values measured in SZ based on this study along with those calculated in our analysis.

EGR1 ChIP-seq peak analysis
Narrow peak bed files filtered to optimal Irreproducible Discovery Rate (IDR) peaks were obtained from the ENCODE data portal (https://www.encodeproject.org/) for EGR1 ChIP-seq data in GM12878, H1-hESC, and K562 cell lines (ENCODE file IDs ENCFF002CIV, ENCFF002CGW, ENCFF002CLV). Consensus EGR1 peaks were identified by intersecting peaks from all three cell lines, which resulted in a final list of 4121 peaks common to all cell lines (minimum overlap of 1 bp). The distance from each annotated transcription start site (TSS) to the nearest consensus EGR1 peak was computed based on TSSs annotated in the ENSEMBL gene transfer format (GTF) file from the Ensembl data release 75 (GRCh37_E75).

Cell-specific enrichment analysis
Sets of genes uniquely expressed by several brain cell types were obtained from Fig. 1b in Darmanis et al. [27]. An index for each cell type was created by calculating the median normalized expression value for each set of cell type-associated genes. Index values were compared across patient clusters by non-parametric rank sum tests and Spearman correlation with top principal components. To validate our method, we calculated cell type-specific indices from an independent cohort of previously published purified brain cells [28,29]. FPKM-normalized gene expression data were obtained from Supplemental Table S4 of Zhang et al. (2015) [28] and cell type indices were calculated as described above. To examine index performance in mixed cell populations, we obtained fastq files for neuron and astrocyte-purified brain samples from Gene Expression Omnibus (GEO) accession GSE73721 and generated raw count files as described above. We next mixed expression profiles in silico by performing random down-sampling of neuron and astrocyte count levels and summing the results such that mixed populations containing specific proportions of counts from neuron-and astrocyte-purified tissue were generated. For example, to generate an 80:20 neuron to astrocyte mixture, neuron and astrocyte count columns (which started at an equivalent number of 5,759,178 aligned reads) were randomly down-sampled to 4,607,342 and 1,151,836 counts, respectively, and summed across each gene to result in a proportionately mixed population of aligned count data simulating heterogeneous tissue. Then we calculated a neuron:astrocyte index ratio capable of predicting the in silico mixing weights. Briefly, we assumed index values for mixed cell populations were directly proportional to mixing weights of their respective purified tissue; thus, the predicted cell proportion for a given cell type was simply calculated as Predicted cell proportion = Observed index value/Purified tissue index value.
To ensure cell type predictive power was unique to indices derived from the genes in Darmanis et al. [27], we generated indices from 10,000 randomly sampled gene sets of equivalent size and examined their performance in predicting in silico mixing weights. Mean squared prediction errors (MSE) were calculated for each of the 10,000 null indices and compared to the MSE of Darmanis et al.-derived indices.
Cell type deconvolution analysis was confirmed using a previously published algorithm implemented in the R package deconRNAseq [30]. The "datasets" input to the deconRNAseq function was a normalized count matrix of all AnCg brain samples and the "signatures" input consisted of a normalized count matrix of astrocyte, neuron, microglia, and oligodendrocyte dissected cells from the GEO accession GSE73721 previously described.
Enrichment analysis for extreme fold change was performed by randomly sampling the fold changes of 1000 null gene sets equivalent in size and expression level (allowing 5% error) to the neuron-and astrocyte-specific gene sets. The median fold change of each 1000 null gene set was compared to the observed median fold change for neuron and astrocyte gene sets, respectively.

Sample preparation
Sections of approximately 100 mg of frozen tissue were weighed and homogenized for 45 s at 6.5 M/s with ceramic beads in 1 mL of 50% methanol using the MP FastPrep-24 homogenizer (MP Biomedicals). A sample volume equivalent to 10 mg of initial tissue weight was dried down at 55°C for 60 minutes using a vacuum concentrator system (Labconco). Derivatization by methoximation and trimethylsilylation was done as previously described [31].
We analyzed technical replicates of each tissue sample, in randomized order.

Data analysis and metabolite identification
Peak calling, deconvolution, and library spectral matching were done using ChromaTOF 4.5 software. Peaks were identified by spectral match using the NIST, GOLM [32], and Fiehn libraries (Leco) and confirmed by running derivatized standards (Sigma). We used Guineu for multiple sample alignment [33].

Integrated pathway analysis
Altered metabolites and genes were analyzed for enrichment in KEGG pathways containing both metabolite and gene features. A non-parametric, threshold-free pathway analysis similar to that of a previously described method [34] was first performed on metabolite and gene expression data separately. Our method builds on the principle described by Subramanian that implements a one-tailed Wilcox test to identify pathways enriched for low p values. Instead of just accounting for enrichment at the gene level, we use metabolite or gene p value ranks within each pathway compared to remaining nonpathway metabolites or genes with a one-tailed Wilcox test to test the hypothesis that elements of a given pathway may be enriched for lower p value ranks than background elements. Metabolite and gene p values were subsequently combined to provide an integrated enrichment significance p value using Fisher's method. Pathways had to contain greater than five genes and one metabolite measured in our dataset to be included in the analysis. Additional file 2: Table S10 lists p values for enriched pathways based on genes, metabolites, or both combined.

Results
Region-specific gene expression in control and psychiatric brain tissue We collected post-mortem human brain tissue, associated clinical data including age, sex, brain pH, and post-mortem interval (PMI), and cytotoxicology results (Additional file 2: Tables S1 and S2) for matched cohorts of 24 patients each diagnosed with SZ, BPD, or MDD, as well as 24 control individuals with no personal history of, or first-degree relatives diagnosed with, psychiatric disorders. Importantly, to limit the effect of acute patient stress at the time of death as a potential confounder, we included only patients with an agonal factor score of zero and a minimum brain pH of 6.5 [18]. Using RNA-seq [21], we profiled gene expression in three macro-dissected brain regions (AnCg, DLPFC, nAcc). After quality control, we analyzed 57,905 ENSEMBL genes in a total of 281 brain samples (Additional file 2: Table S3).
To examine heterogeneity across brain regions and subjects, we performed a principal component analysis (PCA; Additional file 3: Figure S1a) of all genes. The first principal component (PC1, 21.8% of the variation) separates cortical AnCg and DLPFC samples from subcortical nAcc samples. Examination of the first and second principal components for disorder associations reveals a separation of some SZ and BPD samples from all other samples (Additional file 3: Figures S1b and S2a-c). However, in agreement with previously reported postmortem brain RNA sequencing studies [14], we found several principal components to be highly correlated with quality metrics, including the percentage of reads uniquely aligned and percentage of reads aligned to mitochondrial sequence (absolute Rho >0.5, FDR <1E-16; Additional file 2: Table S4). To reduce the potentially confounding effects of sample quality, we repeated the PCA on expression data normalized to the percentage of reads uniquely aligned for each sample and found that global disease-specific expression differences were significantly reduced and PC1 primarily separated nAcc samples from AnCg and DLPFC brain regions (Additional file 3: Figures S1c and S2d-i).

Disease-specific gene expression in control and psychiatric brains
We next applied DESeq2 [24], a method for analyzing differential sequence read count data, to identify genes differentially expressed across disorders within each brain region after correcting for biological and technical covariates. The largest number of significant expression changes occurred in AnCg between SZ and CTL individuals (87 genes, FDR <0.05; Fig. 1a). Pathway enrichment analysis of differentially expressed genes between SZ and CTL patients revealed 935 GO terms with a FDR <0.05 (Additional file 2: Table S5; 122 Gene Ontology Cell Compartment (GOCC) , 159 Gene Ontology Molecular Function (GOMF), and 654 Gene Ontology Biological Process (GOBP) ). Significant GO terms fall into the broad categories of synaptic function and signaling (e.g., neurotransmitter transport, ion transport, calcium signaling; Additional file 3: Figure S3). These terms overlap significantly with those identified by the Psychiatric Genomics Consortium in their analysis of GWAS implicated genes [35], with 68 GO terms meeting a p value cutoff of <0.05 in both datasets (p value <0.0001, Chisquare test). Additionally, nine genes were differentially expressed between SZ and CTL individuals in DLPFC. Three of these were also identified in AnCg: SST, PDPK2P, and KLHL14. No genes had a FDR <0.05 when comparing BPD or MDD samples to CTLs in any brain region, or comparing SZ and CTL tissues in nAcc (Additional file 2: Table S6). To examine potential common gene expression patterns between the psychiatric disorders, we performed pair-wise correlation calculations of all gene log 2 fold changes for each disorder versus controls in each brain region. Of the nine casecontrol comparisons (for three regions and three diseases), a particularly strong correlation is observed between BPD and SZ compared to either SZ or BPD and MDD in each brain region (Fig. 1b). In the AnCg, BPD and SZ share 1020 common genes differentially expressed at an uncorrected DESeq2 p value <0.05 compared to only 248 and 143 genes shared between MDD and SZ or BPD, respectively (Fig. 1c). This strong overlap between BPD and SZ (Fisher's exact p value <1E-16) indicates that although expression changes are weaker in BPD, they follow a trend similar to those identified in SZ.
Because previous post-mortem analyses have been limited by, and are particularly vulnerable to, biases inherent to examining a single patient cohort, we sought to generate a robust set of SZ-associated genes by validating our observed expression changes in an independent cohort. To accomplish this, we examined gene expression differences in the AnCg between SZ and CTL samples in the SNCID RNA-seq Array dataset [13], revealing 1003 genes altered (DESeq2 uncorrected p value <0.05) in both datasets (Fisher's p value <1E-16; Additional file 2: Table S7). The magnitude and direction of change in significant genes in the Pritzker dataset were highly correlated with the SNCID dataset (Rho = 0.202, p value <1E-16), particularly in 87 genes that met a cutoff FDR of <0.05 (Rho = 0.812, p value <1E-16; Fig. 1d). We performed hierarchical clustering of SZ and CTL samples in the SNCID validation cohort using the 1003 genes differentially expressed, at the less stringent threshold, p value <0.05, between SZ and CTL in the Pritzker dataset (Fig. 1e), and found these genes successfully distinguished the two disease groups with only 5 out of 27 SZ and 2 out of 26 CTL samples misclassified.
Of particular interest are the five genes significant at a FDR <0.05 in both cohorts, including a nearly twofold decrease in expression of the transcription factor EGR1 (Additional file 2: Table S7a; Fig. 2a). qPCR validation confirmed reduced EGR1 expression in SZ samples ( Fig. 2b; Wilcox p value = 4.33 × 10E-5). EGR1, a zinc finger transcription factor, has been recently implicated in SZ by a GWAS study [5]; thus, we sought to investigate whether loss of EGR1 expression might be associated with transcriptional changes observed in the AnCg of SZ patients using publicly available genome-wide occupancy data from the ENCODE consortium (https://www.encodeproject.org). To obtain high confidence EGR1 binding sites we intersected chromatin immunoprecipitation sequencing (ChIP-Seq) peaks derived from the H1-hESC, K562, and GM12878 cell lines. We found that genes with a transcription start site (TSS) within 1 kb of an EGR1 binding site had significantly lower DESeq2 p values (Wilcox p value = 9.68E-5) and reduced expression in SZ versus CTL (Wilcox p value = 7.69E-15) compared to genes whose TSSs were greater than 1 kb from an EGR1 binding site. A monotonic decrease in this effect was observed as the distance threshold used for this comparison was increased from 1 to 50 kb (Fig. 2c).

Cell type-specific changes
In addition to dysregulation of broadly acting transcription factors, another mechanism that can drive largescale transcriptional changes in bulk tissue is alterations in constituent cell type proportions. Previous studies have observed decreases in neuron density and increased glial scarring in psychiatric disorders [36,37]. To test for signs of changing cell populations in our dataset we applied a method to deconvolute RNA expression data and estimate cell type proportions. Darmanis et al. [27] identified genes capable of classifying cells into the major neuronal, glial, and vascular cell types in the brain based on single cell RNA sequencing. We used these gene sets to generate cell type indices using the median of normalized counts for each cell type-specific gene set. We tested these indices on purified brain cell populations and in silico mixed cell populations from Zhang et al. [28,29] to demonstrate their accuracy and specificity (Additional file 3: Figure S4). Application of these cell type indices to patient AnCg expression data revealed a significant decrease in neuron specific gene expression (Wilcox p value <0.05) and a significant increase in astrocyte-specific expression (Wilcox p value <0.05) in SZ and BPD patients compared to controls (Fig. 3a, b). Other brain cell type indices were not significantly different between psychiatric patients and controls (Additional file 3: Figure S5). An alternative algorithm for cell type deconvolution, DeconRNASeq, showed similar results (Additional file 3: Figure S6a, b).
Additionally, we showed that neuron-specific genes identified by Darmanis et al. [27] are enriched for decreased expression in SZ compared to controls and astrocyte-specific genes are enriched for increased expression (Additional file 3: Figure S6c). Again, these enrichments are specific to this a b c Fig. 3 Box plots indicating z-scored neuron-specific a and astrocyte-specific b expression indices in the AnCg for SZ (red), BPD (blue), MDD (green), and CTL (gray) individuals. c Correlation plot comparing the log 2 expression fold change between SZ and CTL patients in the AnCg (x-axis) and the log 2 fold change in gene expression from dissected neuron populations compared to all other dissected brain cell types (astrocytes, oligodendrocytes, endothelial cells, and microglia) for each transcript measured by Zhang et al. [28]. b a c Fig. 2 a Box plots indicating relative expression of EGR1 in the AnCg of SZ (red), BPD (blue), MDD (green), and CTL (gray). b Correlation plot comparing RNA-seq measured expression level of EGR1 to qPCR measured expression in ten SZ (red) and ten CTL (black) patients. c Wilcox p values resulting from comparing the degree of differential expression (based on DESeq2 p values) of genes whose TSS are within the indicated distance to an EGR1 binding sites compared to genes whose TSSs are further than the indicated threshold gene set and are not reproduced by 1000 expressionmatched, randomly sampled gene sets (Additional file 3: Figure S6d,e). Further supporting a decrease in neuronal gene expression, we found a significant negative correlation between gene expression changes in patient brains relative to control brains and the degree of neuron-specific transcription (fold enrichment of neuronal gene expression over other cell types; SZ Rho = −0.50 and BPD Rho = −0.41, p value <1E-16; SZ shown in Fig. 3c).

Transcriptomic changes reflected in altered metabolomic profiles
To assess the biochemical consequences of expression changes, we used 2D-GCMS to measure metabolite levels in 86 of the AnCg samples (sufficient tissue was unavailable for ten samples). We measured and identified 141 unique metabolites (Additional file 2: Table S8). We found no metabolites reached statistical significance (FDR <0.05); however, eight metabolites had a FDR <0.1 when comparing SZ to CTL. Similar to our gene expression analysis, metabolite levels (Additional file 2: Table S9) successfully differentiated SZ and BPD patients from CTLs (Fig. 4a), while MDD metabolite profiles were very similar to CTLs. Several of the most significant metabolites, including GABA, are known to be relevant to BPD and SZ (Fig. 4b) [38]. Furthermore, GABA:glutamate metabolite ratios correlate strongly with average GAD1 and GAD2 expression levels measured by RNA-seq (Rho = 0.413, p value = 0.007; Fig. 4c, d). This metabolite-gene relationship is consistent with previous multi-level phenomic analyses [39] and demonstrates realized biochemical consequences from altered gene expression. Notably, reductions in GABA could coincide with loss of neuron-specific gene expression as suggested by the RNA-seq data. Integrated pathway analyses of metabolite and gene expression data revealed disruption of synaptic and neurotransmitter signaling in SZ compared to CTL (Additional file 2: Table S10 and Additional file 3: Figure S7). a b c d Fig. 4 a Hierarchical clustering of SZ (red), BPD (blue), MDD (green), and CTL (black) individuals using the top ten most significant metabolites for each case-control comparison (for a total of 30 metabolites). b Box plots indicating z-scored GABA metabolite levels. c Box plots indicating relative expression of GAD1 and GAD2 enzymes in the AnCg of SZ (red) and CTL (gray) patients. d Correlation plot comparing average GAD1 and GAD2 expression and the GABA:glutamate metabolite level ratio in the AnCg of SZ (red) and CTL (black) individuals

Discussion
Here, we describe a large transcriptomic dataset across three brain regions (DLPFC, AnCg, and nAcc) in SZ, BPD, and MDD patients, as well as CTL samples matched for agonal state and brain pH. In MDD, we did not identify any genes that meet genome-wide significance for differential expression between cases and controls in any brain region. This finding agrees with previous post-mortem RNA-seq studies [40]; however, sample size and the choice of brain regions examined likely contributed to our inability to replicate results from previous non-transcriptome-wide sequencingbased approaches comparing MDD to CTL in postmortem brain [41]. One limitation of our study is that females are underrepresented at a rate of about 5:1. This reflects the increased chance of accidental death among males [42], but limits us in our ability to make more general conclusions about these disorders and to address known differences between the sexes as they relate to these disorders. We also do not have information on the smoking status for our cohort, which is an important covariate as smoking rates are higher among patients with psychiatric disorders and smoking has been demonstrated to affect gene expression [43,44]. Another potential limitation inherent to post-mortem cohort analyses is accounting for patient drug use. As detailed in Additional file 2: Table S2, patient toxicology reports were positive for several prescribed and illicit drugs that were not present in CTL samples. As this is a bias inherent to psychiatric patients, it is impossible to disentangle from non-treatment-related disease patterns in a post-mortem analysis. Another important limitation of post-mortem RNA-seq studies is RNA quality. We found a significant proportion of variation in our data to be associated with multiple alignment quality metrics. Significant effort went into controlling for potential sources of bias due to differences in RNA quality. We only included tissue from patients with an agonal score of 0 and who had a brain pH of 6.5 or greater. We also controlled for brain pH, post-mortem interval, and alignment quality in all differential expression analyses. Our study, as well as future post-mortem studies, could be improved by directly measuring RNA quality at the time of sample preparation (e.g., RNA integrity number (RIN)). Even with these caveats, we believe our data yield new insights contributing to a growing understanding of these disorders.
The most dramatic gene expression signals we observed were brain region-specific. The majority of diseaseassociated expression differences were seen in the AnCg of SZ compared to CTL individuals. The AnCg has been associated with multiple disease-relevant functions, including cognition, error detection, conflict resolution, motivation, and modulation of emotion [45][46][47]. We observed a striking overlap in SZ-and BPD-associated expression changes consistent with previous findings [38,48].
One of the more intriguing genes significantly downregulated (FDR <0.05) in both cohorts of SZ patients was the zinc finger transcription factor EGR1. We provide evidence that this factor binds upstream of genes with altered expression in SZ and is associated with decreased expression in SZ patients. Down-regulation of EGR1 has been previously described in the prefrontal cortex of post-mortem brain samples from SZ patients [49,50]. EGR1 has also previously been associated with several phenotypes relevant to psychiatric disorders, including neural differentiation [51], emotional memory formation [52], and response to antipsychotics [53], and has recently been described as part of a transcription factor-miRNA co-regulatory network capable of acting as a biomarker in peripheral blood cells for SZ [54]. In mice, loss of EGR1 has been linked to neuronal loss in a model of Alzheimer's disease [55]. EGR1 is also important for regulation of the NMDA receptor pathway, which is critical for synaptic plasticity and memory formation and has been implicated in SZ in humans [56]. We believe a more detailed examination of genome-wide EGR1 occupancy in post-mortem brain tissue or cultured neurons could yield additional information and assessment of the functional consequences of EGR1 loss is required to confirm this factor's role in SZ pathogenesis.
We also see evidence for depletion of neuron-specific genes and increased levels of astrocyte-specific genes in SZ and BPD patients. This observation is further supported by metabolomic analysis of the AnCg, which found a concordant decrease in GABA levels in BPD and SZ individuals. Neuronal depletion has been previously described in SZ [36,37]. Insufficient tissue remains from our patient cohort to validate computational cell type predictions immunohistochemically; however, our data strongly suggest that future post-mortem studies should be cognizant of cell type heterogeneity across patient samples. The method for cell type composition estimation is limited in its accuracy to estimating only the major classes of cells present. Genes represented in cell types present at only a small minority could be overor under-represented using this technique. Based on these results, future studies should consider using robust techniques for assessing tissue composition to examine potential cell type proportion differences between disease cohorts and to identify which transcriptional changes occur in conjunction with, and independent of, those differences.
We observed very few or no significant expression differences in the DLPFC and nAcc, which contradicts several previous studies [14,57]. We do not intend to claim that no transcriptional changes occur in these brain regions as our study was designed to broadly compare transcriptional alterations across multiple brain regions in multiple psychiatric disorders, thereby sacrificing exceptional sample sizes in any single disorder in any specific brain region. However, our data do suggest that, of the regions we tested, the strongest transcriptional changes occur in the AnCg of SZ patients. Moreover, these data provide a useful resource for future studies facilitating the testing of preliminary hypotheses or validation of significant findings.

Conclusions
Our study provides several meaningful and novel contributions to the understanding of psychiatric disease. We provide a well-annotated data set that has the potential to act as a broadly applicable resource for investigators interested in molecular changes in multiple psychiatric disorders across multiple brain regions. We have conducted an extensive characterization of the molecular overlap between SZ and BPD at the gene expression and metabolite levels across multiple brain regions. We provide a highconfidence set of genes differentially expressed between SZ and CTL individuals utilizing two independent cohorts and highlight down-regulation of EGR1 as a potential driver of broader scale transcription changes. We also establish that a significant proportion of transcriptome variation within SZ and BPD cohorts is correlated with expression changes in previously identified cell type-specific genes.