Circuits of cancer drivers revealed by convergent misregulation of transcription factor targets across tumor types

Background Large tumor genome sequencing projects have now uncovered a few hundred genes involved in the onset of tumorigenesis, or drivers, in some two dozen malignancies. One of the main challenges emerging from this catalog of drivers is how to make sense of their heterogeneity in most cancer types. This is key not only to understand how carcinogenesis appears and develops in these malignancies to be able to early diagnose them, but also to open up the possibility to employ therapeutic strategies targeting a driver protein to counteract the alteration of another connected driver. Methods Here, I focus on driver transcription factors and their connection to tumorigensis in several tumor types through the alteration of the expression of their targets. First, I explore their involvement in tumorigenesis as mutational drivers in 28 different tumor types. Then, I collect a list of downstream targets of the all driver transcription factors (TFs), and identify which of them exhibit a differential expression upon alterations of driver transcription factors. Results I identify the subset of targets of each TF most likely mediating the tumorigenic effect of their driver alterations in each tumor type, and explore their overlap. Furthermore, I am able to identify other driver genes that cause tumorigenesis through the alteration of very similar sets of targets. Conclusions I thus uncover these circuits of connected drivers which cause tumorigenesis through the perturbation of overlapping cellular pathways in a pan-cancer manner across 15 malignancies. The systematic detection of these circuits may be key to propose novel therapeutic strategies indirectly targeting driver alterations in tumors. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0260-1) contains supplementary material, which is available to authorized users.


Supplementary Figure 1. Summary of all analyses carried out in this work
A) Detection and description of driver TFs. A list of genes driving tumorigenesis in different cancer types (drivers specific to each tumor type) identified through the combination of three signals of positive selection in their pattern of mutations in each cohort of tumors was obtained from reference 10. The intersection of these mutational drivers with an exhaustive list of human TFs produced a catalog of 64 driver TFs. (Note that only genes expressed in each tumor type can be nominated as drivers; therefore, all driver TFs are expressed in the tumor type where they act as drivers.)

B) Relative enrichment for mutations in domains.
Lists of somatic mutations in tumors and germline variants in the human population affecting the 64 driver TFs were obtained from reference 10 and the ExAC database (see Methods). The latter were filtered by allelic frequency to keep only likely polymorphisms. Both sets were then mapped onto the protein coordinates of the driver TFs and the number of mutations and variants mapped to each domain in each driver TF were counted. The relative overrepresentation of mutations in each domain was finally computed via Fisher's exact tests.

C) Targets of TFs involved in tumorigenesis.
Lists of known and predicted targets of 42 driver TFs were collected from several databases. The expression matrices of several TCGA cohorts of tumors (each representing one tumor type) were filtered using these lists, to retain only the expression of potential targets of each TF. The expression values of the targets of each driver TF across the tumor samples of a cancer type were probed for differential expression between the tumors where the TF is altered and the tumors where it is not altered. Targets with significant (p<0.05) Mann-Whitney test and log2 fold-change above 1 or below -1 were considered miss-regulated upon alterations of the TF (TF DE genes).

D) Circuits of TFs and connected partners.
All (non-TF) drivers directly connected (through a functional interactions network) to each of the 42 driver TFs probed above were retrieved as potential circuit partners. The expression values of the targets of each driver TF across the tumor samples of a cancer type were probed for differential expression between the tumors where the potential partner is altered and the tumors where it is not altered, exactly as explained above for the TFs, which produced a set of partner DE genes. Finally, TF DE genes and partner DE genes were probed for significant overlap.

Supplementary Tables
Supplementary  To produce this table, I first randomly sampled groups of genes of the same size as the starting number of targets annotated for each TF. Then, I checked how many of these genes appeared differentially expressed between the samples with alterations of the TF and the samples where the TF is unaltered. I iterated this process 10000 times and computed an ad hoc p-value (p_DE_targets) of the representativity of the TF targets as the amount of these iterations where the number of recorded differentially expressed targets of the TF was larger than the number of differentially expressed genes.
(I limit the analysis to TFs with less than 500 targets, to assure enough difference in sampling the groups of random genes.) Low p-values, thus denote TFs for which the differential expression analysis detects mostly genes within their lists of collected targets. On the other hand, TFs-tumor types combinations with p-values close to 1 represent cases in which differentially expressed genes are distributed both within and outside the collected targets. This may be due to i) incompleteness of the collections of targets of these TFs -mainly indirect targets-, ii) dramatic changes in gene regulation that take place in tumorigenesis or iii) spurious results from the differential expression analysis. To distinguish between these two possibilities I then carried out a second analysis to estimate the expected number (as fraction of the number of known targets of the TF) of differentially expressed genes to be detected given the number of samples where the TF bears driver alterations in the tumor type under analysis. Briefly, for each TFtumor type combination, I randomly assigned the samples 100 times to two groups, one of them composed of the same number as the samples with driver alterations of the TF. I then probed the differential expression of a random set of genes of the same size as the known targets of the TF. Finally, by integrating the counts of differentially expressed genes across these 100 iterations, and comparing them to the observed number of differentially expressed targets of the TF, I computed a Zscore (Z_expected_DE_genes). This Zscore thus measures the significance of the number of observed differentially expressed targets given the expected number of differentially expressed genes from factors in principle not associated to alterations in the TF -i.e., such as massive changes in transcriptional program due to tumorigenesis.
According to the combination of the p_DE_targets and the Z_expected_DE_genes I classified TFtumor type combinations into three groups (column Class). Those in the 'known' group possess both a significant p_DE_targets (p<0.05) and a significant Z_expected_DE_genes (Z>1.96) and therefore correspond to cases where the fraction of differentially expressed targets are significantly higher than expected from factors not necessarily associated to TF alterations and also significantly higher than the number of differentially expressed genes outside the list of TF targets. 'putative_unknown' targets of TFs have a significant Zscore, but non significant p, pointing probably to an important number of yet undiscovered targets which become misregulated upon alteration of the TF. Finally, the set of 'possibly_unspecific' targets of TFs correspond to cases where the fraction of differentially expressed targets is neither significantly higher than expected from groups of random genes nor greater than expected from factors not associated to alterations in the TF. Differential expression detected within the targets of these TFs cannot therefore be linked exclusively to the alteration of the TF.  File 7). Assessment of the mutual exclusivity of alterations of driver TF circuits Two methods (mutex and Comet; see Methods) that compute the mutual exclusivity of alterations were used on all TF driver circuits explored in this study with at least one target gene in common between the TF and its partner. The overlap between the fraction of these circuits that exhibit a significant overlap of targets (signif_circ in the Table) and those detected as pairs with significant mutually exclusive alterations (signif_mutex, signif_both, signif_comet) is rather small ). This is because the overlap of significantly miss-regulated targets and the mutual exclusivity of alterations are orthogonal ways of assessing the relationships between driver genes. While the former relies on the information of targets, and their expression in the same samples where the mutational and CNA status of the driver TFs and partners is assessed, and cannot be used if this is not available, the latter only requires the knowledge of these mutational and CNA status of the drivers. On the other hand, the overlap of the miss-regulation of targets theoretically could detect convergent alterations between driver TFs and their partners that fall below the threshold of significance of mutual exclusivity (as suggested by the results of the Table). Thus, a bioinformatics method developed using the rationale presented in this study may represent a good alternative to mutual exclusivity to detect such relationships between driver genes.