Maintenance of the human memory T cell repertoire by subset and tissue site

Immune-mediated protection is mediated by T cells expressing pathogen-specific T cell antigen receptors (TCR) that are maintained at diverse sites of infection as tissue-resident memory T cells (TRM) or that disseminate as circulating effector-memory (TEM), central memory (TCM), or terminal effector (TEMRA) subsets in blood and tissues. The relationship between circulating and tissue resident T cell subsets in humans remains elusive, and is important for promoting site-specific protective immunity. We analyzed the TCR repertoire of the major memory CD4+ and CD8+T cell subsets (TEM, TCM, TEMRA, and TRM) isolated from blood and/or lymphoid organs (spleen, lymph nodes, bone marrow) and lungs of nine organ donors, and blood of three living individuals spanning five decades of life. High-throughput sequencing of the variable (V) portion of individual TCR genes for each subset, tissue, and individual were analyzed for clonal diversity, expansion and overlap between lineage, T cell subsets, and anatomic sites. TCR repertoires were further analyzed for TRBV gene usage and CDR3 edit distance. Across blood, lymphoid organs, and lungs, human memory, and effector CD8+T cells exhibit greater clonal expansion and distinct TRBV usage compared to CD4+T cell subsets. Extensive sharing of clones between tissues was observed for CD8+T cells; large clones specific to TEMRA cells were present in all sites, while TEM cells contained clones shared between sites and with TRM. For CD4+T cells, TEM clones exhibited the most sharing between sites, followed by TRM, while TCM clones were diverse with minimal sharing between sites and subsets. Within sites, TRM clones exhibited tissue-specific expansions, and maintained clonal diversity with age, compared to age-associated clonal expansions in circulating memory subsets. Edit distance analysis revealed tissue-specific biases in clonal similarity. Our results show that the human memory T cell repertoire comprises clones which persist across sites and subsets, along with clones that are more restricted to certain subsets and/or tissue sites. We also provide evidence that the tissue plays a key role in maintaining memory T cells over age, bolstering the rationale for site-specific targeting of memory reservoirs in vaccines and immunotherapies.


Background
Adaptive immune responses mediated by T lymphocytes are critical for protection against diverse pathogens and depend on the rapid mobilization of T cells to infection sites [1]. In primary responses, antigen-specific naïve T cells in lymphoid tissue become activated, clonally expand, and differentiate into effector cells which migrate to tissue sites for immune-mediated pathogen clearance. A subset of these previously activated T cells persists as heterogeneous memory T cell subsets which either become retained in tissues as non-circulating tissueresident memory T cells (TRM) [1,2], or circulate through blood and tissue sites as central-memory (TCM), effector-memory (TEM), and terminally differentiated effector cells (TEMRA) subsets [3]. TRM are phenotypically and functionally distinct from circulating memory subsets, mediate optimal protective immunity to site-specific pathogens compared to circulating subsets, and are implicated in anti-tumor immunity [4][5][6][7]. The role of circulating memory subsets in protection and their relationship to TRM remains enigmatic. Understanding how memory T cells circulating in blood relate to memory T cell subsets in the tissues is of central importance for defining and monitoring protective T cell responses in humans.
Each T cell expresses a unique T cell antigen receptor (TCR) comprised of α and β chains encoded by TRA and TRB genes which derive from the rearrangement of TRAV and TRAJ, or TRBV, TRBD, TRBJ gene segments, respectively. These gene rearrangements can give rise to a theoretical diversity of 10 15 distinct β-chains [8,9]. High-throughput sequencing (HTS) of the variable regions of TCR genes has enabled quantitative assessments of clonal diversity and expansion within and between individuals, in health and in disease [10,11]. By HTS, the measured TCR diversity for human naïve T cells is 10 7 -10 8 different clones; memory T cells exhibit extensive clonal expansion and sigificantly lower diversity, particularly for CD8 + T cells [12][13][14]. TCR clonal analysis has also be used to identify tumor-associated T cell clonal expansions for tracking in peripheral blood [15] and specific clonal expansions within subsets that are associated with disease states [16,17]. However, a comprehensive baseline assessment of the distribution of clonally expanded memory T cells across subsets and tissues has not been accomplished and is necessary to interpret their significance in disease states.
Here, we investigated how the TCR repertoire of memory T cells is distributed by subset and location by TCR sequencing of the major CD4 + and CD8 + memory subsets isolated from blood, lymphoid tissues, and lungs of individual organ donors using a tissue resource we have extensively validated for human immune cell studies [13,[18][19][20]. Quantitative and qualitative aspects of the TCR repertoire were analyzed as a function of the memory T cell subset, tissue, lineage (CD4 vs. CD8), and individual. We found that clonal diversity and expansion were intrinsic features of the lineage and subset, with CD8 + TEMRA cells having the highest clonal expansion, CD4 + TCM cells the lowest, and TRM and TEM at intermediate diversity independent of the tissue of origin. Accordingly, the extent of overlap between sites was highest for TEMRA and lowest for TCM cells, while TRM and TEM exhibited significant clonal overlap suggesting a common origin. We also observed tissuespecific expansions for memory clones and that qualitatively similar clones segregated more by tissue, than by subset. Finally, we detected a loss of diversity of circulating but not TRM subsets with age. Together, these results indicate that while memory T cells are maintained as highly expanded clones across the body, tissues can serve as reservoirs for maintaining memory T cell diversity and tissue-adapted T cell specificities.

Acquisition of human tissue
Human tissues were obtained from deceased organ donors at the time of organ acquisition for clinical transplantation through an approved protocol with LiveOnNY, the organ procurement organization for the New York metropolitan area [21]. We obtained blood, multiple lymphoid sites (bone marrow (BM), lymph nodes (LN), spleen (Spl)), and lungs from human organ donors. Donors were free of cancer and negative for hepatitis B, C, and HIV. Isolation of tissues from organ donors does not qualify as "human subjects" research, as confirmed by the Columbia University IRB. For isolation of blood from living individuals, blood was drawn via venipuncture from consented volunteers, as approved by the Columbia University IRB. The individual donors in this study represented a diverse population spanning five decades of adult life (29-63 years) ( Table 1).

Isolation of mononuclear cells from human tissues
Tissue samples were maintained in cold saline and brought to the laboratory within 2-4 h of procurement. Spleen, lung, and LN draining the lung were processed using enzymatic and mechanical digestion as previously described resulting in high yields of live leukocyte s [13,18,21]. Mononuclear cells were isolated from blood and BM with Lymphocyte Separation Medium (Corning, USA).

DNA extraction
Sorted T cells were pelleted and resuspended in cell lysis solution (Qiagen) and DNA was isolated from cell lysate using the Gentra Puregene kit (Qiagen) for 9 donors (D383, D466, D324, D299, D255, D233, HD1, HD2, and HD3). For 3 donors (D287, D280, and D229), DNA and RNA were extracted using an RNA/DNA kit (AllPrep DNA/RNA mini kit, Qiagen). Upon extraction of DNA from purified T cells, DNA was divided into equal parts for replicate amplification and sequencing. The amount of DNA sequenced per sample was constant for the first two replicates of each individual donor, with the exception of blood from D383, in which fewer cells were obtained. The quantity of DNA isolated and sequenced per sample is indicated in Additional File 1: Table S1.

TRB gene amplification, library preparation, and sequencing
Targeted PCR was used for amplification of TRB sequences from genomic DNA, using a cocktail of forward primers specific for framework region 2 (FR2) sequences of 23 TRBV subgroups (gene families), and 13 TRBJ region reverse primers adapted from the BIOMED2 primer series [22]. Amplicons were purified using the Agencourt AMPure XP beads system (Beckman Coulter, Inc.). Second-round PCRs to generate the sequencing libraries were carried out using NexteraXT Index Primers S5XX and N7XX. Libraries were sequenced using an Illumina MiSeq in the Human Immunology Core Facility at the University of Pennsylvania. 2 × 300 bp paired end kits were used for all experiments (Illumina MiSeq Reagent Kit v3, 600 cycle, Illumina Inc., Cat. No. MS-102-3003).

TCR read counting and clone mapping
Raw reads were pre-processed using pRESTO [23] v0.5.10 and then annotated using IgBLAST's igblastn command v1.17.0 [24] as shown in Additional File 1: Supplementary Methods. For IgBLAST, the IMGT human TRBV and TRBJ reference databases from October 24, 2019, were used. Low-quality sequences were removed if their average shred quality score was less than 30, stretches of bases on each end of all reads that were of low average quality were removed, short sequences (100 bases or fewer) were discarded, and individual bases with a phred low quality score of less than 30 were replaced with an N. Finally, any sequences with more than 10 such Ns were removed. IgBLAST was then run on the resulting filtered sequences producing AIRRcompliant output files (Additional File 1: Supplementary Methods).
AIRR-compliant output files were then imported into ImmuneDB v0.29.9 [25,26] using the immunedb_import function [25,26] (Additional File 1: Supplementary Blood 1 55 F + + n/a n/a n/a n/a n/a n/a n/a n/a n/a Blood 2 32 F n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a Blood 3 29 M n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a Methods). We defined clonally related sequences as those with identical TRBV and TRBJ gene segments and CDR3 amino acid sequences. We required that a unique sequence be detected at least twice (within an individual) in order to be designated a clone to reduce over estimation of clones due to sequencing errors.
TCR diversity, clonality, and TRBV usage TCR diversity and clonality analyses were performed using replicate one (see Additional File 1: Table S2) per subset per donor to normalize the cell input. The clonal summary plots were generated using the clonal.proportion function in the tcR package in R. To calculate clonality, given a clone, denoted x, frequency denoted p(x), and total set size of unique clones denoted L, Here, clonality was calculated as normalized entropy as described [13,27]. For Vβ analysis, we applied a copy number cut-off for clones below 50% of the mean copy number frequency of each sample and used replicate 1 for each sample. The 50% mean copy number cut-off was calculated as follows: the total number of copies in a sample was summed and then divided by the number of unique clones, to derive the mean copy number for that sample. The mean copy number was halved, and any clones with a copy number below the 50% mean copy number frequency were removed from the sample. We calculated the 50% mean copy number cut-off for each sample. We counted the number of unique clones with each TRBV gene call. For TRBV frequencies, we calculated the percentage of total clones that utilized each TRVB gene. For heatmap visualization, percentages were centered and scaled by column using the pheatmap R package so that red is a high percentage and blue is a low percentage. The heatmap columns were clustered using the complete linkage option in pheatmap. Data from all 12 donors were aggregated in the bar graph comparing CD4 + and CD8 + T cells. Principal component analysis was visualized using the factoextra package in R.

Cosine similarity
Cosine similarity analysis was performed on replicates 1 and 2 per sample to normalize cell input (Additional File 1: Table S2). We applied a copy number cut-off, removing clones that were below 50% of the mean copy number frequency in each sample, and calculated the cosine similarity using the cosine function in R. We show the mean of four pairwise comparisons between the two replicates per sample, reported in the heatmap. For bar plots of cosine values, cosine similarity was normalized to account for the between-subject variability and the standard error was calculated from the normalized data.

Edit distance calculation and visualization
Overlappping clones and non-productive rearrangements were removed from the dataset, and, based on a set of 500 random clones per sample, we calculated the edit distance (Levenshtein distance represents the number of insertions, deletions, and substitutions required to change one sequence to the other [28]) between every pair of CDR3 amino acid sequences and the Kullback-Leibler (KL) divergence [29] between the TRBV gene composition of the samples. Edit distances were computed, aggregated, and graphed or visualized as tSNE plots (for 250 random clones) with jellyfish.levenshtein_ distance, using the default settings. For the KL divergence (for 500 clones), we computed the frequency of each Vβ in each sample (p i (v) and p j (v) for samples i and j), and computed the symmetrized KL divergence: A pseudo count of 1 was added to each v gene for the estimate of the frequencies.
For tSNE analysis, we combined all CDR3 sequences and reduced their dimensions to 2 using tSNE [30] with the default settings. Replicates 1 and 2 were used for this analysis.

Statistical analysis
Descriptive statistics (means, SEM, median) were calculated for each cell subset and clonality and overlap (Cosine) were calculated in R. Significant differences in subset clonality were assessed using a Student's t test.
As the goal of the study was to assess the clonal relatedness of previously primed (non-naïve) T cells between different tissues and blood by TCR gene sequencing, we sorted the major subsets of non-naïve CD4 + and CD8 + T cells from the five sites. Thus, CD8 + TEM, TRM, and TEMRA, and CD4 + TEM, TRM, and TCM were sorted from blood and/or different tissue sites of 9 organ donors, and blood of 3 living individuals, resulting in 148 different biological samples (Fig. 1A). (CD8 + TCM and CD4 + TEMRA represented low frequency populations [13] and were not included in the analysis.) Total DNA was isolated from each purified cell subset and divided into two replicate samples from which the variable portion of the TRB gene containing V, D, and J segments and encoding the CDR3 region [11,33] was PCR-amplified, used to generate libraries and sequenced (Additional File 1: Fig. S1, Table S1).

T cell repertoire diversity and clonality is determined by lineage and subset
Individual TRBV and TRBJ gene segments, CDR3 sequences, and clone counts were identified from the resultant 84 million valid sequences in this dataset using ImmuneDB [25,26] (see methods). Based on sequencing an equivalent number of sorted cells, we detected an average of 3750 clones per CD4 + T cell sample and 1382 clones per CD8 + T cell sample across all individuals (Additional File 1: Table S2), consistent with previous observations that human CD8 + T cells are more clonally expanded than CD4 + T cells [12,13]. We then asked whether the TCR repertoire within specific subsets or tissues varied by the extent of diversity and/or clonal expansion. Calculating the percent of the TCR repertoire that was represented by the top most abundant clones (top 10, 100, 1000, > 1001) revealed that memory CD8 + T cells contained highly abundant clones, such that the top 10 clones comprised up to 80% of the T cell repertoire, whereas for CD4 + T cells, the top 10 clones comprised < 10% of the total T cell repertoire for all subsets and tissues (Fig. 1B, C).
The highly abundant clones among CD8 + T cells for each donor were largely within the CD8 + TEMRA subset in all sites, followed by CD8 + TEM and TRM (Fig. 1B, C). For CD4 + subsets, TEM exhibited the highest clonal expansion that was greater than or comparable to TRM, while TCM exhibited the lowest clonal expansion (Fig. 1B,  C). All individual donors exhibited a similar hierarchy of clonal expansion from highest to lowest: CD8 + TEMRA > CD8 + TEM and CD4 + TRM > CD4 + TCM (Fig. 1B), and subsets in the blood maintained the same hierarchy (TEMRA>TEM>TCM) (Fig. 1C). These clonal hierarchies and large clonal expansions were found for CMVseropositive and CMV-seronegative donors (Fig. 1B, D, E). Clonal expansion was therefore intrinsic to lineage and subset.
As another quantitative measure of repertoire diversity that incorporates clonal expansion, we calculated TCR clonality, ranging from 0 (least clonally expanded; maximally diverse) to 1 (monoclonal, no diversity; see methods). Compiling T cell subset results from all sites (blood and tissues) revealed that clonality is a feature of the subset, ranked from highest to lowest CD8 + TEM-RA>CD8 + TEM~TRM>>CD4 +-TEM~CD4 + TRM>>CD4 + TCM (Fig. 1D). Moreover, clonality did not differ significantly by tissue; rather, high clonality was a feature of the CD8 + lineage, consistent with previous studies [12,13]. One notable exception was that both CD8 + and CD4 + T cells in LN exhibited a higher diversity and lower clonality than other sites (blood, BM, lung) (Fig. 1E). These results indicate that quantitative aspects of the memory T cell repertoire are determined by subset, independent of site of origin; however, LN maintained higher TCR diversity compared to other sites.
We further investigated whether there was differential usage of certain TRBV genes between subsets and tissues, in the entire sequenced repertoire. TRBV gene usage for the three donors where subsets from > 4 sites were obtained reveals certain biases in TRBV expression across all donors, with the overall frequencies differing between individuals ( Fig. 2A). TRBV gene usage for these donors based on lineage, tissue, and subset was assessed by hierarchical clustering and prinicipal component analysis (PCA), revealing distinct clustering by lineage, but not by tissue, and clustering for certain subsets ( Fig. 2A,B, Additional File 1: Fig. S2A). Lineage-specific clustering was most apparent by PCA: there was tight clustering of TRBV usage for CD4 + T cells, which was distinct from TRBV usage for CD8 + T cells, which was also more heterogeneous than for CD4 + T cells (Fig. 2B, Additional File 1: Fig.  S2A). Subset-specific clustering patterns for TRBV usage were observed for CD8 + TEMRA which clustered separately from the other subsets (Fig. 2B, Additional File 1: Fig. S2A), consistent with TEMRA cells consisting of a few highly expanded clones.
Similar clustering and heterogeneity patterns for TRBV usage for CD4 + and CD8 + T cells and subsets was observed for all 12 donors grouped together (Additional File 1: Fig. S2B), suggesting that TRBV usage and clonality are intrinsic features of CD4 + and CD8 + T cell subsets. To address whether specific TRBV genes exhibit biased expression in CD4 + or CD8 + T cells across individuals, we examined TRBV usage for all CD4 + and CD8 + T cells in all donors, identifying significant differences in frequency of 25 different TRBV genes between the lineages (greater differences were observed for specific genes TRBV12.3, 18, 27, 30, 5.1, and 6.4 (Fig. 2C), consistent with previous findings [34,35]. Therefore, the memory T cell repertoire across multiple sites within an individual is selected for specific TRBV genes based on CD4 + or CD8 + lineage, with some skewing by highly expanded subsets.

Clonal overlap reveals patterns of sharing and relatedness between subsets and sites
The extent of T cell clonal overlap between tissue sites and subsets could reveal insights into their migration and lineage relationships, respectively. Calculating the cosine similarity [36] between samples within individuals resulted in values ranging from 0 (minimal overlap) to 1 (complete overlap). Heat maps for CD4 + and CD8 + subsets for each of three donors with > 4 tissues show similarity between samples (Fig. 3A), while compiled cosine similarity values for all donors reveals the overall relatedness of subsets and their migration between sites (Fig. 3B). Overall, CD8 + T cells had higher cosine similarity (and greater clonal overlap) between sites and subsets compared to CD4 + T cells (Fig. 3A, B), consistent with their larger clonal expansion (Fig. 1). Between sites, the highest cosine similarity (mean value 0.71, ± 0.07 se) was observed for CD8 + TEMRA cells in blood, BM, spleen, and lungs of all donors analyzed, indicating that the highly expanded TEMRA clones circulate between tissue sites. CD8 + TEM cells also exhibited high cosine similarity between sites (mean value 0.53, ± 0.05 se) that was equivalent or greater than for CD8 + TRM (mean value 0.46 ± 0.04 se), depending on the specific site pairings, though the extent of overlap varied between donors (Fig. 3A, 3B (left)). Similarly, CD4 + TEM exhibited the highest overlap between sites for CD4 + subsets (mean value 0.27 ± 0.04 se), consistent with TEM being more circulating, while TCM exhibited the lowest overlap between sites (mean value 0.12 ± 0.01 se) (Fig. 3A, B), consistent with the higher diversity of CD4 + TCM cells relative to all memory subsets. The extent of overlap for TRM subsets varied between sites; there was higher overlap for CD4 + TRM between spleen and BM (mean value 0.31 ± 0.03 se), and for CD8 + TRM between lung and LLN (mean value 0.52 ± 0.06 se) (Fig. 3), suggesting that TRM formation or homeostasis is localized to certain related sites.
We also compared the clonal overlap between different subsets (in the same or different sites) to assess potential lineage relationships. For both CD4 + and CD8 + T cells, TRM and TEM exhibited the greatest overlap within a given site compared to other subsets (mean value ± se: CD4 + TRM -TEM 0.23 ± 0.03, CD8 + TRM -TEM 0.59 ± 0.06) (Fig. 3B, middle), suggesting that these subsets either derive from a common lineage or interconvert during maintenance. By contrast, CD8 + TEMRA exhibited minimal overlap with CD8 + TRM cells within or between sites LG, or LN) (left), between different subsets within a single site (middle) or between different subsets in different sites (right) for CD8 + (upper) and CD4 + (lower) T cells (mean value ± se: CD8 + within sites TEMRA -TRM 0.10 ± 0.03, CD8 + between sites TEMRA -TRM 0.07 ± 0.02) (Fig. 3B), and CD4 + TCM also exhibited minimal overlap with CD4 + TRM or CD4 + TEM (mean value ± se: CD4 + TCM -TEM 0.12 ± 0.01, CD4 + TCM -TRM 0.10 ± 0.02) (Fig. 3B). These findings suggest that CD8 + TEMRA and CD4 + TCM are generated along a pathway distinct from the corresponding TEM and TRM cells for CD8 + and CD4 + T cells, respectively. Together, these analyses reveal that circulating and highly expanded subsets (e.g., TEMRA, TEM) are broadly shared across tissue sites, while TRM are less broadly shared between sites, and a potential common clonal origin for certain TEM and TRM subsets.

Individual clone tracking reveals tissue-specific TRM expansion
To further investigate how individual clones may be shared or segregated within tissues, we tracked the abundance and sharing of large individual clones between subsets and sites for the three donors with 4-5 sites examined, based on similar analysis with B cell clones in tissue sites [37]. Shown in Fig. 4A are the resultant "line-circle plots," where each line represents one clone spanning 4-5 sites, each circle shows the presence of that clone in a particular site; circles are colored by subset and the size of the circle represents the proportion of the TCR repertoire occupied within each site. We use this analysis to highlight five major patterns of TCR clonal distribution across sites for each lineage that are shared by all three of the most highly sequenced tissue donors. For CD4 + T cells, certain TEM or TCM clones were distributed similarly across sites, while there were also clones with variable distribution across subsets and sites. Notably, a number of TRM clones were enriched in the lung, BM, or LN, in all donors (Fig. 4A), suggesting site-specific clonal maintenance. For CD8 + T cells, there were TEMRA or TEM clones distributed across sites, while certain TRM clones were also enriched in single sites such as lung, BM, spleen, or LN (Fig. 4A). This qualitative analysis of clonal distribution showing TRM clones confined to or enriched in certain sites provides strong evidence for tissue-specific maintenance of TRM cells.

TRM cells maintain clonality over age compared to circulating TEM cells
To further assess TRM maintenance in tissues, we examined potential changes in clonality with age. The clonality of memory T cells in blood is known to increase with age [12], due to the outgrowth of specific clones from persistent stimulation and/or homeostatic maintenance by cytokines and other factors [38]. Consistent with results in blood, we found a significant Fig. 4 TRM clones exhibit tissue-specific expansions and clonal stability with age. A Clone tracking across sites and subsets. Each line-circle plot depicts and individual CD4 + or CD8 + T cell clone, their subset identity and relative frequency in each site. Each line represents a single clone, circles indicate presence within a site (bone marrow (BM), blood (BL), lung-draining lymph node (LN), lung (LG), or spleen (SP)); the size of the circle is proportional to the total number of copies of the clone in the sample, and colored wedges in the circle indicate what fraction of copies derive from individual subsets (TCM (blue), TEM (orange), TEMRA (green), TRM (red)) from CD4 + (left) and CD8 + (right) T cells for D324 (top), D383 (middle) and D466 (bottom). Representative clones shown from the top 500 clones for each donor. B Clonality associations with age for each subset. Scatter plots with fitted lines (black) of clonality measurements from Fig. 1 by age from linear regression analysis with associated p values (P) indicated below each plot for CD4 + (left) and CD8 + (right) subsets. Tissue site indicated by shape increase in clonality across all sites with age for both CD4 + and CD8 + TEM cells (Fig. 4B). (TEMRA clonality also exhibited an upward trend with age, which did not achieve significance, possibly due to the already high clonality at younger ages). By contrast, the clonality of CD4 + TRM and CD4 + TCM remained strikingly constant in all sites with age and the clonality of CD8 + TRM also did not exhibit a significant increase with age (Fig. 4B). These results suggest that memory T cells retained in tissues are stably maintained without significant homeostatic expansion or age-associated changes, and may therefore constitute cellular reservoirs for longterm immunological memory.

Tissue-specific segregation of qualitatively similar clones
The preceding analyses have focused on frequency (Figs. 1 and 3), V and J gene usage (Fig. 2), and overlap (Figs. 3 and 4) of individual clones. Next, we took a different approach, by analyzing the amino acid sequence of clones to determine if features that spanned different clones tracked by tissue or subset. In order to do this, we first took each sample and counted each unique clone once and removed clones that overlapped between the different samples from the analysis. We then analyzed CDR3β amino acid sequence similarity using edit distances (Levenshtein [28]) and TRBV sequence similarity using the Kullback-Leibler divergence [27]. These measures are independent of clone size and focused on equivalent numbers of non-overlapping clones for each subset in individual tissue donors (see Methods). CDR3 edit distance matrices visualized in t-SNE plots for all three donors reveal distinct patterns of clustering based on tissue site; clones derived from blood cluster together, next to clusters of clones from BM, adjacent to clusters for LN, lung, and spleen (Fig. 5A, top row). By contrast, tSNE plots of edit distance based on subset revealed intermingling of subsets and no clear pattern of clustering (Fig. 5A, bottom row). The calculated edit distances were not due to differences in CDR3 lengths which were similar between donors, lineages, tissues, and subsets (Additional File 1: Fig. S3A,B), but rather are due to sequence features of the clones.
Compiling the average edit distances for all 9 tissue donors shows the highest edit distances (i.e., most differences) between TCR sequences between individuals, followed by lineages (CD4 + or CD8 + ), while the lowest edit distance (i.e., highest similarity) is between sample replicates (Fig. 5B, Additional File 1: Fig. S3C). Between tissues and subsets, the edit distance based on tissue was significantly higher than the magnitude of the edit distance based on TEM, TRM, TEMRA, or TCM subset, consistent with the clustering of similar clones based on tissue, but not subset (Fig. 5B, Additional File 1: Fig.  S3C). The same hierachy of donor, lineage, tissue, subset, and replicate was also observed with the Kulback-Leibler (KL) divergence in TRBV gene sequences in 500 randomly selected clones (minus overlapping clones) (Fig. 5B) and in 200 randomly selected clones which also includes analysis of Donor 466 which had fewer clones than the other donors (Additional File 1: Fig. S3C). Moreover, similar hierarchies were observed based on calculating Hamming distances [39] (Additional File 1: Figure S4). These results provide evidence for segregation of qualitatively similar clones based on CDR3β amino acid and TRBV gene sequences in individual tissue sites, further supporting a role for the tissues in long-term clonal maintenance that is suggested by the stability of TRM clones with age.

Discussion
Memory T cells are maintained across the body as a record of previous antigen encounters. How specific clones of memory T cells are distributed and maintained in blood and tissues has been challenging to address due to constraints on tissue sampling for humans. Here, we used our validated organ donor tissue resource to investigate the role of subset and tissue in the maintenance of the memory T cell repertoire. We applied HTS to identify the TCR β chain variable region in sorted circulating and tissue resident memory T cell subsets isolated from blood, lymphoid organs, and lung and applied quantitative and qualitative analysis of individual clones across subsets and tissues. Our analysis reveals that quantitative aspects of T cell clonal expansion and sharing between sites are lineage and subset-specific, while the tissue site plays an important role in clonal maintenance. Our results provide a novel assessment of how the memory T cell repertoire is maintained across lineages, subsets, tissues, and age.
Our results identify lineage-specific features of TCR clonal expansion and Vβ usage that are conserved across subsets and sites. T cell lineages differ significantly in the extent of clonal expansion across sites: CD8 + T cells are more clonally expanded and contain more clones with high copy number relative to CD4 + T cells, consistent with earlier findings in blood and tissues [8,12,13]. Increased clonal expansion by CD8 + T cells may be due to their increased responses to viruses during an acute response and the continuous surveillance of CD8 + T cells to persistent viruses. Memory CD4 + and CD8 + T cells also exhibit consistent qualitative differences in TRVB usage for all sites, subsets, and individuals, as reported previously for circulating T cells [35], which likely derive from the differences in MHC class II and class I binding motifs, respectively.
In addition to lineage, we also show that the extent of clonal expansion and dissemination across multiple tissue sites are features intrinsic to specific memory T cell subsets. Remarkably, the CD8 + TEMRA subset in multiple sites and in different individuals consistently showed the highest degree of clonal expansion. A limited number of unique clones (and skewed TRBV usage) was represented within this subset-in some individuals, only 10 clones comprised > 80% of the sequenced TEMRA repertoire. As none of the 12 individuals from whom TEMRA clones were sequenced had overt disease, large TEMRA clonal expansions occur within normal T cell homeostasis, and may not be reliable indicators of disease as previously suggested [17]. Extensive clonal expansion of CD8 + T cells is associated with accumulated responses to acute or persistent viruses, resulting in a narrowing of the TCR repertoire [40], and TEMRA in different tissue sites can be specific for persistence viruses such as CMV [41]. As we observed highly abundant TEMRA clones across sites for both CMVseropositive and -seronegative individuals (see Table 1), such patterns may reflect exposure to diverse pathogens. The fact that large TEMRA clones disseminated across multiple sites, yet exhibited minimal overlap with TEM, TRM, and TCM subsets, suggests TEMRA cells are generated distinct from memory subsets.
TEM cells also contained expanded clones, a proportion which were shared across sites, albeit to a lesser extent than TEMRA clones. However, a significant proportion of TEM clones were shared with TRM clones both within and across tissues. We propose that this A Comparing edit distance by tissue and subset. tSNE projections of TCR clones based exclusively on the edit distances between CDR3 amino acid (protein) sequences. Shown are 250 non-overlapping clones that were randomly chosen from each of the three most extensively sampled donors, D324, D383, and D466 (see Methods). Each clone (dot) is colored according to its tissue origin (upper) or subset identity (lower). TCM, central memory; TEM, effector memory; TEMRA, terminal effector; TRM, resident memory; Bld, blood; BM, bone marrow; LN, lung lymph node; Spl, spleen. B Hierarchy of TCR repertoire similarity. Average CDR3 edit distances in amino acid (left) and V gene Kullback-Leibler (KL) divergence (right) are computed for samples by donor, lineage (CD4 or CD8), tissue, subset, and biological replicates from separate DNA aliquots from the same sample (see Methods). A total of 500 clones were randomly chosen from all tissue donor samples for each analysis. Average values and standard errors for each measure were computed. Differences between neighboring samples were computed using a two-sided unpaired t test. * p < 0.05; ** p < 0.01; *** p < 0.001 sharing of TEM and TRM clones likely derives from an initial priming event leading to effector expansion and differentiation to circulating and tissue memory T cells. Previous studies in mice showed that a single naïve T cell clone activated in vivo by antigen/adjuvant or virus infection can generate diverse memory subsets (TCM, TEM, TRM, etc.) in multiple sites [42,43]. Moreover, TCR clonal analysis following skin immunization revealed similar TCR clones in skin TRM and lymphoid memory T cells [42], further indicating that responses originating in tissues can generate widely distributed memory subsets from a common clonal origin. The high overlap for CD8 + TRM across sites suggests TRM generation from a broadly expanded CD8 + effector population that acquires TRM features at local sites.
Our quantitative analysis of TCR clones between and within tissues and within tissue-resident compared to circulating subsets provide several lines of evidence that the tissue plays a significant role in clonal maintenance. Analysis of clonal overlap between sites revealed that TRM clones were less widely shared across sites compared to circulating TEM clones and also exhibited tissue-specific clonal expansions in lung, LN, or BM. Tissue-specific effects on the TCR repertoire were further observed for LN memory CD4 + and CD8 + T cells which had higher TCR diversity, beneficial for antigenrecognition [44], compared to the other sites. These results along with our previous findings that LN memory T cells are more quiescent [33], suggest tissue-specific memory maintenance. Moreover, with age, both CD4 + and CD8 + TRM maintained their clone size while the clonality of circulating memory subsets increased. Together, these quantitative assessments of clonality over sites and age provide evidence for tissues as reservoirs of long-term clonal maintenance.
The edit distance anlaysis reveals a more general role for tissue in the clonal organization of T cell memory. When the amino acid sequences of each unique clone were analyzed for similarity by edit distance measurements, we found clustering of similar TCR sequences by tissue site, but not by subset. These results are further consistent with our single cell transcriptomic profiling of T cells in the same sites analyzed here (lung, BM, LN, blood), showing that all tissue T cells (including TEM and TRM within a site) exhibit tissue-specific gene expression signatures that are distinct from blood [45]. T cells in tissues may be responding to cells and/or factors that promote memory T cell survival, including homeostatic cytokines and MHC molecules with cognate or non-cognate antigen [46,47], resulting in preferential segregation of TCR clones with a given site.
An understanding of how long-term immunological memories are organized in humans is important for promoting durable protective immunity in vaccines. The question of how T cell immunity in blood can predict protection in sites of infection is particularly relevant in the current pandemic for assessing immunity to infection and vaccines [48,49]. Our results show that for tissue memory T cells, including TRM and certain TEM clones, blood may be minimally representative of the tissue-enriched populations. The edit distance results further show that blood-derived TCR clones segreagate distinct from clones in tissue sites. A recent study also showed that memory CD8 + T cell clones in blood were distinct from those circulating through the lymphatics which could transit through tissues [50]. Promoting TRM-mediated protection in mouse models requires site specific, rather than systemic priming [51,52]. Our findings on the role of the tissue in TCR clonal maintenance in humans indicates that promoting tissue-specific memory responses in humans may likewise require sitespecific strategies.

Conclusions
The human memory T cell repertoire is maintained across multiple sites and different subsets. The quantitative nature of clonal maintenance is a feature of the lineage and subset, while the tissues play a key role in maintaining tissue-adapted clones and serve as reservoirs for stable maintenance of long-term memory responses. Our findings provide a systematic analysis of T cell clones in multiple sites throughout the body.
These data can can serve as a new reference for defining tissue-based T cell immune responses to infection and vaccination and as a comparator for immune repertoires in diseases such as autoimmunity and cancer.
Additional File 1: Supplementary methods, supplementary tables (S1-S2), and supplementary figures (S1-S3). Table S1. Number of replicates, copies, unique sequences and clones identified for each individual donor used in this study. Table S2. DNA concentration and number of unique clones identified for each T cell sample. Figure S1. Gating strategy for T cell subset isolation and workflow for TCR sequencing. Figure S2. Principal component analysis (PCA) of TRBV gene usage of individual donors and the compiled dataset. Figure S3. Tissue segregation of TCR clones across sites. coordinated the sequencing and data acquisition, analyzed the data, and helped write and edit the paper; S.D. and Y.L. performed the edit distance and KL calculations and helped write the paper; D.L.F planned the experiments, coordinated the tissue and data acquisition, analyzed data, and wrote and edited the paper. All authors read and approved the final manuscript.

Funding
This work was supported by NIH AI106697 and AI128949 awarded to D.L.F. and AI106697 awarded to E.L.P. M.M. was supported by NIH T32AI06711. B.K. was supported by TL1 TR001875. Research reported here was performed in the CCTI Flow Cytometry Core, supported by award S10RR027050 & S10OD020056 and the Perelman School of Medicine Human Immunology Core, supported by P30-CA016520 and P30-AI0450080.

Availability of data and materials
The data generated for this study have been uploaded in an Adaptive Immune Receptor Repertoire (AIRR)-compliant manner to SRA/GenBank, BioProject PRJNA638966 [53], available through the following link: https:// www.ncbi.nlm.nih.gov/bioproject/PRJNA638966/

Declarations
Ethics approval and consent to participate Isolation of tissues from organ donors does not qualify as "human subjects" research, as confirmed by the Columbia University IRB. For isolation of blood from living individuals, blood was drawn via venipuncture by written informed consented volunteers, as approved by the Columbia University IRB. This research conformed to the principles of the Helsinki declaration.