Profiling tissue-resident T cell repertoires by RNA sequencing
© Brown et al. 2015
Received: 15 August 2015
Accepted: 13 November 2015
Published: 30 November 2015
Deep sequencing of recombined T cell receptor (TCR) genes and transcripts has provided a view of T cell repertoire diversity at an unprecedented resolution. Beyond profiling peripheral blood, analysis of tissue-resident T cells provides further insight into immune-related diseases. We describe the extraction of TCR sequence information directly from RNA-sequencing data from 6738 tumor and 604 control tissues, with a typical yield of 1 TCR per 10 million reads. This method circumvents the need for PCR amplification of the TCR template and provides TCR information in the context of global gene expression, allowing integrated analysis of extensive RNA-sequencing data resources.
Primary sequence analysis of the highly variable complementarity determining region 3 (CDR3) of rearranged T cell receptor (TCR) genes provides insight into the adaptive immune response. T cells recognize peptide epitopes presented on the surface of cells on major histocompatibility complex (MHC) molecules. CDR3 is the TCR motif that directly binds MHC-presented peptide epitopes and this binding interaction is the main factor conferring T cell antigen specificity. Typically, CDR3 sequence information is acquired by performing TCR-sequencing (TCR-seq) experiments on peripheral T cells isolated from blood [1, 2]; amplifying the CDR3 region with a conserved C gene primer followed by 5′ rapid amplification of cDNA ends , or with a multiplexed set of V and J gene primers . TCR-seq applied to tissue specimens can provide insight into tumor-infiltrating lymphocytes [4, 5], T cells associated with autoimmune pathology [6–8] and infection , and the properties of normal primary and secondary lymphatic tissues [10, 11].
The research described herein conformed to the Helsinki Declaration. All clinical specimens not part of The Cancer Genome Atlas (TCGA) were obtained previously  with informed consent by the British Columbia Cancer Agency Tumor Tissue Repository, which operates as a dedicated biobank with approval from the University of British Columbia-British Columbia Cancer Agency Research Ethics Board (certificate #H09-01268).
Extraction of T cell receptor CDR3 sequences from RNA-seq data
We deployed MiTCR  v1.0.3, which is well suited for the annotation of CDR3 sequences from sequencing reads. However, upon initial application of MiTCR to tumor RNA-seq data using the default parameters, we identified hundreds of non-specific and out-of-frame CDR3 sequences per sample, which prompted us to explore alternative parameters. Closer inspection of the bogus CDR3s identified low similarity between these sequences and the putative flanking TCR V and J gene segments, suggesting that the false positives were spurious, non-TCR hits to TCR-like sequences elsewhere in the transcriptome. Therefore, we optimized settings using positive and negative control RNA-seq data. The positive control dataset comprised TCR sequences generated in silico, as follows: V, (D), J, and C gene reference sequences for human TCR alpha and beta chains were downloaded from the ImMunoGeneTics information system . To generate each transcript, V, (D), J, and C genes were chosen randomly. Non-templated nucleotide addition and deletion frequencies at the V-(D)-J gene junctions were modeled from observed frequencies in normal TCR chain beta repertoires . Owing to the absence of D genes in the alpha chain, the number of bases added between the V and J genes was selected by averaging the number to add to both the V-D and D-J junctions in a beta chain. Out-of-frame transcripts and those that contained stop codons were removed. Full-length recombined TCR transcript sequences were run through MiTCR using stringent alignment parameters (minimum V and J alignment length both set to 20 in the XML parameter file; default value is 12) to annotate the CDR3 region in the transcript, and to ensure the in silico recombination created a CDR3 sequence able to be detected by MiTCR. We generated 10,000 transcripts each of alpha and beta, with 8573 alpha and 8804 beta sequences successfully being identified by MiTCR and being used as the source for the positive control dataset. The distribution of CDR3 lengths for the in silico generated alpha and beta chains are displayed in Additional file 1: Figure S1.
For negative control RNA-seq data, we used paired-end 101-nucleotide RNA-seq data from seven TCR-negative cell lines, downloaded from ENCODE  and pooled for use as a negative control (Additional file 1: Table S1). To create negative datasets for shorter read lengths, reads from the 101-nucleotide datasets were truncated to 76-nucleotide and 50-nucleotide reads. For positive control datasets, error-free reads (101, 76, and 50 nucleotides) were created for each in silico generated CDR3, with the center of the CDR3 region positioned at the center of the read.
An unbiased parameter space exploration was performed across all pairwise combinations of V gene minimum alignments and J gene minimum alignments (values 8 to 26 explored, all other parameters set as default) to determine optimal parameters. For each of the 361 parameter pairs, MiTCR was run on the negative and positive control datasets. For negative control datasets, the number of detected bogus CDR3s was tracked, and for positive control datasets, the number of correctly annotated CDR3s was tracked. Optimal parameters were assessed for each TCR chain–read length combination. Sensitivity was calculated for each parameter pair by dividing the number of recovered CDR3s by the maximum number of recovered CDR3s for all parameter pairs of that TCR chain–read length combination, giving a relative sensitivity value. We used a binary categorization to bin the false discovery rates as acceptable or not. For a set of false discovery rates, we selected the best parameter pair that had an acceptable false discovery rate and highest sensitivity. In the case of multiple parameter pairs being equally acceptable, the pair that minimized the V and J alignment parameters was selected. These optimal parameters are summarized in Additional file 1: Table S2.
Benchmarking CDR3 extraction efficiency using simulated data
The Flux Simulator  v1.2.1 is a computational tool that generates RNA-seq datasets by simulating a transcriptome expression profile, library construction, and sequencing errors. We simulated a range of sequencing depths (104–108 reads) and read lengths (50, 76, and 101 nucleotides) to determine the importance of different factors on characterizing the ability to detect a given CDR3 sequence. Full-length in silico recombined TCR sequences were annotated as single exon genes in a reference synthetic chromosome sequence file, which we added to the human genome (GRCh38) to be used as the reference genome for Flux Simulator. Flux Simulator was run with the following command line flags: −t simulator, −x (to simulate expression), −l (to simulate library construction), −s (to simulate sequencing), and –p parameterFile.par. The parameter file contained the following parameters: REF_FILE_NAME: path to .gtf file; GEN_DIR: directory with genome reference files; FASTA: true; ERR_FILE: 76; READ_LENGTH: one of 50, 76, 101; PAIRED_END: true, UNIQUE_IDS: true; READ_NUMBER: one of 10000, 50000, 100000, 500000, 1000000, 5000000, 10000000, 50000000, 100000000; and TMP_DIR: path to temporary directory.
Approximation of TCR transcript abundance from percent T cell infiltration
We queried Illumina BodyMap (http://www.ebi.ac.uk/gxa/experiments/E-MTAB-513) and GTEx (http://www.gtexportal.org/home/) to find the expression of TRAC and TRBC1/2 in healthy whole blood. An average expression of approximately 150 fragments per kilobase of transcript per million mapped reads was observed for these genes. Because these genes are roughly 1 kb in length, this level of expression translates to a transcript fraction on the order of 1.5 × 10−4. Because lymphocytes are generally between 20 and 40 % of white blood cells, a pure lymphocyte population would have a TCR transcript fraction of approximately 5 × 10−4. Assuming a similar cellular composition between peripheral blood lymphocytes and tumor infiltrating lymphocytes (TIL), a tumor with 2 % TIL (the median value for TCGA tumors, parsed from TCGA biospecimen slide data) would have a TCR transcript fraction of 1 × 10−5. This value can be inserted into the logistic regression model in order to predict the minimum sequence depth required to have a 50 % chance of detecting a CDR3 sequence with this abundance and other known properties.
TCGA RNA-seq data analysis
All available RNA-seq fastq files for solid tumors and matched normal tissues were downloaded with permission from Cancer Genomics Hub (https://cghub.ucsc.edu/). This included 8655 samples from 24 tumor sites. The majority (79 %) of this TCGA RNA-seq data are 50-nucleotide reads, while 21 % are 76 nucleotides, with sequencing depths ranging from 5.27 × 106 to 4.51 × 108 reads. To be able to directly compare the extracted CDR3s across all samples, we performed a pre-normalization step by truncating all reads to 50 nucleotides, and randomly sub-sampling 1.00 × 108 reads from every sample. This resulted in removal of 41.5 % of all sequence data (5.20 × 1011 reads) and 15.2 % of samples (1313) owing to insufficient depth, leaving 7342 (6738 tumor and 604 normal) samples and 7.34 × 1011 total reads for analysis. Prior to running MiTCR, the fastq files were cleaned by only retaining reads longer than 40 nucleotides and reads containing standard (ACTGN) bases. For every sample, MiTCR was run with the optimized V and J alignment parameters for a theoretical 0 % false discovery rate with 50 nucleotide reads for both alpha and beta chains, keeping all other parameters default. Extracted CDR3 sequences that contained stop codons or frame shifts were removed prior to all further analysis.
TCGA gene expression datasets
In order to correlate extracted TCR diversity with the expression of immune-related genes, all available RNASeqV2 data from the TCGA Data Portal were downloaded. These data provide gene expression information generated using MapSplice  for alignment and RSEM  to quantify gene expression. The reported scaled_estimate value was multiplied by 106 to obtain transcripts per million (TPM). To obtain consensus gene expression values, we summed the TPM values within each of the following groups: HLA Class I (HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G), Class II (HLA-DMA, HLA-DMB, HLA-DOA, HLA-DOB, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DRA, HLA-DRB1, HLA-DRB5), CD8 (CD8A, CD8B), or CD3 (CD3D, CD3E, CD3G). Pearson correlations were calculated between these genes and the number of distinct CDR3 sequences in each individual (Additional file 1: Figures S4 and S5).
Inferred pairing of TCR alpha and beta subunits
For each tumor sample, all possible pairwise combinations of TCR alpha and beta subunit CDR3 sequences derived from that sample were specified (n = 1,286,810). We then looked for recurrent alpha-beta pairs among all TCGA tumor samples, and identified 188 distinct alpha-beta pairs that were found in at least two individuals. To test if this was a stronger alpha-beta co-occurrence than would be expected by chance, we randomized the relationship between sample identifiers and their corresponding TCR alpha and beta sequences. We then regenerated all possible pairwise combinations within subjects and determined the frequency of recurrent alpha-beta pairs in this randomized dataset. Randomization was repeated for 100 iterations, and the proportion of trials that had a degree of sharing greater than or equal to the original, non-randomized data was taken as the P value.
Shared peptide-MHC and CDR3 sequences
We predicted HLA Class I alleles and MHC-presented point mutations for 1361 TCGA individuals as previously described , with an IC50 value of 500 nM for MHC-predicted binding affinity taken as the maximum threshold for potential immunogenicity. We counted the frequency of each peptide-MHC (pMHC; n = 305,438), and found 393 to be recurrent. In order to determine if any of the individuals sharing a pMHC also shared a similar CDR3 sequence, we took all CDR3 sequences from all 1361 individuals, and clustered them to allow for inexact CDR3 sequence matches. We clustered at 95 % identity using CD-HIT [23, 24] v4.6 with the following parameters: −c 0.95; −n 5; −l 5. We then checked each cluster to see if it contained sequences from at least two individuals, and if those individuals shared a pMHC. We found one cluster of two individuals that met this criteria. To test how frequently this would be expected to happen by chance, we randomly selected two individuals from the set, and checked if they shared a pMHC. We measured the fraction of successes from 1,000,000 trials, then multiplied by the number of clusters that contained two individuals (n = 1457) to correct for multiple testing to give the adjusted P value.
Custom code is available upon request.
Results and discussion
Somatically rearranged T cell receptor sequences can be effectively recovered from RNA-seq data
We optimized the extraction of TCR alpha and beta chain sequences from RNA-seq datasets by evaluating negative and positive control datasets and adjusting search parameters. We identified optimal V and J alignment parameters that yielded an average of 94 % sensitivity for 100 % specificity using the shortest (50 nucleotide) reads (Additional file 1: Table S2, see “Methods”). Sensitivity was limited, ultimately, by the inability to detect the small proportion of CDR3s that are longer than a sequencing read (Additional file 1: Figure S3).
To estimate the yield of TCR transcripts that could be expected from typical RNA-seq experiments, we used Flux Simulator  to generate simulated RNA-seq data spiked with in silico recombined TCR transcripts, and processed these data as described in “Methods”. As expected, the most abundant TCR transcripts were the most readily detected and the sensitivity of the method increased with increasing sequencing depths and read lengths (Additional file 1: Figure S2). We fit a multivariate logistic regression model to explain the odds of detecting a CDR3 sequence from the RNA-seq data using the explanatory variables log10(transcripts per million) (odds ratio [OR] 7.242; 95 % confidence interval [CI] 7.079, 7.411; P < 2 × 10−16), sequencing depth in tens of millions of reads (OR 1.667; 95 % CI 1.658, 1.677; P < 2 × 10−16), sequence read length (OR 1.0358; 95 % CI 1.035, 1.037; P < 2 × 10−16), and CDR3 nucleotide sequence length (OR 0.958; 95 % CI 0.956, 0.960; P < 2 × 10−16), and observed that initial TCR transcript abundance is the most important factor in predicting whether a CDR3 would be detected.
For tumor tissue, where the degree of T cell infiltration is typically about 2 % (see “Methods”), we would expect 0.001 % of the total transcripts to be recombined TCR transcripts. Assuming a monoclonal infiltrate, an RNA-seq depth of 70 million 50-nucleotide reads is required to have a greater than 50 % chance of detecting a 45-nucleotide CDR3 (the most frequent CDR3β length ). Additional probabilities are presented in Additional file 1: Table S4. To further evaluate the yield of TCR sequences from RNA-seq data, we generated TCR-seq for TCR beta chain (1,411,056 reads; 2823 unique CDR3β sequences) and RNA-seq for total cDNA (56,067,687 reads; 9 unique CDR3β sequences) data from the same colorectal tumor tissue sample, using previously described methods [12, 26]. We observed that all high confidence CDR3βs identified by RNA-seq (n = 9) fell within the top 2.1 % (n = 60) of CDR3βs detected by TCR-seq, ranked by abundance (data not shown), confirming that at a modest depth of sequencing, RNA-seq can identify the most abundant CDR3-encoding transcripts. The most abundant T cells may or may not be the most biologically relevant T cells. In solid tumors, the relationship between clonal abundance of T cells and anti-tumor immunity is not yet clear—it is obscured by the presence of bystander T cells in the tumor environment as further discussed below. In cases such as T cell leukemia, where the T cell is the cancerous cell and is highly expanded, the most abundant CDR3 sequences found by RNA-seq should generally be adequate to identify the tumor clone and monitor disease progression . Likewise, in cases such as acute infection, the most abundant T cells are likely those that are most biologically relevant. TCR transcripts from rare T cells will become more accessible in future because continually declining sequencing costs will allow deeper and deeper transcriptome sampling by RNA-seq.
TCR sequence diversity in tumor-associated T cell repertoires
Public T cells are common in the tumor environment
Given the observation of substantial sharing of CDR3 sequences among individuals, we asked if some of the shared alpha and beta CDR3 sequences may represent shared dimeric TCRs. To test this we generated all possible alpha-beta pairs within each individual’s alpha and beta repertoires, and looked for sharing of any pairs between two or more individuals. We observed 188 distinct alpha-beta pairs that were found in at least two individuals, which was not significantly more than would be expected by chance (P = 0.42, random resampling, see “Methods”). We also asked if individuals with shared mutations and shared HLA alleles may also share TCR sequences. Previously, for a subset of TCGA individuals, we identified tumor point mutations predicted to yield MHC class I binding peptides (pMHCs) . We clustered the CDR3 amino acid sequences from all individuals with predicted mutant pMHCs at 95 % amino acid sequence identity. Of the 17,092 total CDR3 sequence clusters (including singletons), only one cluster contained sequences from two individuals with matching mutant pMHCs (Additional file 1: Table S6). Although this is not statistically significant (P = 0.35, random resampling, see “Methods”), with deeper sequencing data from large numbers of individuals, this approach may prove useful for matching TCR sequences to the neo-antigens they recognize.
In future, as sequence costs continue to decline, there will be increasing opportunities to derive immune signatures from unbiased data types. Here, we have optimized an analytical strategy for extracting T cell repertoire information from RNA-seq datasets, and used it to characterize tumor-associated T cell repertoires. We have provided optimal parameters for mining RNA-seq datasets of varying read lengths, and provided a range of parameters for varying levels of acceptable false positive rates. This procedure was validated on simulated RNA-seq datasets with known recombined TCR transcripts, and was also compared to classical TCR-seq data, showing that the subset of TCRs we detected using RNA-seq were the most abundant T cells in the sample. The expected yield of TCR reads from RNA-seq data is ultimately dependent on the level of T cell infiltration in the sample and the clonality of the infiltrate. Assuming a similar cellular composition to TCGA tumors, one can expect on the order of one TCR read from 10 million sequence reads. Our analysis has highlighted a strong and novel correlation between tumor TCR diversity and tumor MHC Class II expression and high prevalence of public T cells in the tumor environment. Further, within the limitations of the available data, we have explored the association between alpha-beta TCR pairs, and linking TCR sequences to specific pMHC complexes. Analyses of this nature may inform future cancer immunotherapy strategies, and we expect that this same approach will have value in exploring other immune-related pathologies, where large RNA-seq datasets already exist or can be obtained.
We thank NIH and the Cancer Genome Atlas Research Network for data access (study accession phs000178.v8.p7). The results published here are in part based upon data generated by The Cancer Genome Atlas managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov. We thank ENCODE for data access. This work was supported by the British Columbia Cancer Foundation and grants from the Canadian Institutes of Health Research (CIHR) (MOP-102679). SDB is supported by a CIHR Frederick Banting and Charles Best Canadian Graduate Scholarship.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Woodsworth DJ, Castellarin M, Holt RA. Sequence analysis of T-cell repertoires in health and disease. Genome Med. 2013;5:98.PubMed CentralView ArticlePubMedGoogle Scholar
- Freeman JD, Warren L, Webb JR, Nelson BH, Holt RA. Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res. 2009;19:1817–24.PubMed CentralView ArticlePubMedGoogle Scholar
- Robins HS, Campregher PV, Srivastava SK, Wacher A, Turtle CJ, Kahsai O, et al. Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. Blood. 2009;114:4099–107.PubMed CentralView ArticlePubMedGoogle Scholar
- Clemente CG, Mihm MC, Bufalino R, Zurrida S, Collini P, Cascinelli N. Prognostic value of tumor infiltrating lymphocytes in the vertical growth phase of primary cutaneous melanoma. Cancer. 1996;77:1303–10.View ArticlePubMedGoogle Scholar
- Sato E, Olson SH, Ahn J, Bundy B, Nishikawa H, Qian F, et al. Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high CD8+/regulatory T cell ratio are associated with favorable prognosis in ovarian cancer. Proc Natl Acad Sci U S A. 2005;102:18538–43.PubMed CentralView ArticlePubMedGoogle Scholar
- Cantaert T, Brouard S, Thurlings RM, Pallier A, Salinas GF, Braud C, et al. Alterations of the synovial T cell repertoire in anti-citrullinated protein antibody-positive rheumatoid arthritis. Arthritis Rheum. 2009;60:1944–56.View ArticlePubMedGoogle Scholar
- Jonasson L, Holm J, Skalli O, Bondjers G, Hansson GK. Regional accumulations of T cells, macrophages, and smooth muscle cells in the human atherosclerotic plaque. Arterioscler Thromb Vasc Biol. 1986;6:131–8.View ArticleGoogle Scholar
- Traugott U, Reinherz E, Raine C. Multiple sclerosis: distribution of T cell subsets within active chronic lesions. Science. 1983;219:308–10.View ArticlePubMedGoogle Scholar
- Matloubian M, Concepcion RJ, Ahmed R. CD4+ T cells are required to sustain CD8+ cytotoxic T-cell responses during chronic viral infection. J Virol. 1994;68:8056–63.PubMed CentralPubMedGoogle Scholar
- Moon JJ, Chu HH, Pepper M, McSorley SJ, Jameson SC, Kedl RM, et al. Naive CD4(+) T cell frequency varies for different epitopes and predicts repertoire diversity and response magnitude. Immunity. 2007;27:203–13.PubMed CentralView ArticlePubMedGoogle Scholar
- Parrott DM V, Tait C, MacKenzie S, Mowat AM, Davies MDJ, Micklem HS. Analysis of the effector functions of different populations of mucosal lymphocytes. Ann N Y Acad Sci. 1983;409(1 The Secretory):307–20.View ArticlePubMedGoogle Scholar
- Warren RL, Freeman JD, Zeng T, Choe G, Munro S, Moore R, et al. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 2011;21:790–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Warren RL, Choe G, Freeman DJ, Castellarin M, Munro S, Moore R, et al. Derivation of HLA types from shotgun sequence datasets. Genome Med. 2012;4:95.PubMed CentralView ArticlePubMedGoogle Scholar
- Boegel S, Löwer M, Schäfer M, Bukur T, De GJ, Boisguérin V, et al. HLA typing from RNA-Seq sequence reads. Genome Med. 2012;4:102.PubMed CentralView ArticlePubMedGoogle Scholar
- Warren RL, Freeman DJ, Pleasance S, Watson P, Moore RA, Cochrane K, et al. Co-occurrence of anaerobic bacteria in colorectal carcinomas. Microbiome. 2013;1:16.PubMed CentralView ArticlePubMedGoogle Scholar
- Bolotin DA, Shugay M, Mamedov IZ, Putintseva EV, Turchaninova MA, Zvyagin IV, et al. MiTCR: software for T-cell receptor sequencing data analysis. Nat Methods. 2013;10:813–4.View ArticlePubMedGoogle Scholar
- Lefranc M-P. IMGT, the International ImMunoGeneTics Information System. Cold Spring Harb Protoc. 2011;2011:595–603.PubMedGoogle Scholar
- Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.View ArticleGoogle Scholar
- Griebel T, Zacher B, Ribeca P, Raineri E, Lacroix V, Guigó R, et al. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 2012;40:10073–83.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38:e178.PubMed CentralView ArticlePubMedGoogle Scholar
- Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.PubMed CentralView ArticlePubMedGoogle Scholar
- Brown SD, Warren RL, Gibb EA, Martin SD, Spinelli JJ, Nelson BH, et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 2014;24:743–50.PubMed CentralView ArticlePubMedGoogle Scholar
- Li W, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–3.View ArticlePubMedGoogle Scholar
- Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics. 2002;18:77–82.View ArticlePubMedGoogle Scholar
- R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013.Google Scholar
- Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 2012;22:299–306.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu D, Sherwood A, Fromm JR, Winter SS, Dunsmore KP, Loh ML, et al. High-throughput sequencing detects minimal residual disease in acute T lymphoblastic leukemia. Sci Transl Med. 2012;4:134ra63.View ArticlePubMedGoogle Scholar
- Schumacher T, Bunse L, Pusch S, Sahm F, Wiestler B, Quandt J, et al. A vaccine targeting mutant IDH1 induces antitumour immunity. Nature. 2014;512:324–7.View ArticlePubMedGoogle Scholar
- Kreiter S, Vormehr M, van de Roemer N, Diken M, Löwer M, Diekmann J, et al. Mutant MHC class II epitopes drive therapeutic immune responses to cancer. Nature. 2015;520:692–6.View ArticlePubMedGoogle Scholar
- Tran E, Turcotte S, Gros A, Robbins PF, Lu Y-C, Dudley ME, et al. Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. Science. 2014;344:641–5.View ArticlePubMedGoogle Scholar
- Lossius A, Johansen JN, Vartdal F, Robins H, Jūratė Šaltytė B, Holmøy T, et al. High-throughput sequencing of TCR repertoires in multiple sclerosis reveals intrathecal enrichment of EBV-reactive CD8+ T cells. Eur J Immunol. 2014;44:3439–52.View ArticlePubMedGoogle Scholar