Evolving neoantigen profiles in colorectal cancers with DNA repair defects

Background Neoantigens that arise as a consequence of tumor-specific mutations can be recognized by T lymphocytes leading to effective immune surveillance. In colorectal cancer (CRC) and other tumor types, a high number of neoantigens is associated with patient response to immune therapies. The molecular processes governing the generation of neoantigens and their turnover in cancer cells are poorly understood. We exploited CRC as a model system to understand how alterations in DNA repair pathways modulate neoantigen profiles over time. Methods We performed whole exome sequencing (WES) and RNA sequencing (RNAseq) in CRC cell lines, in vitro and in vivo, and in CRC patient-derived xenografts (PDXs) to track longitudinally genomic profiles, clonal evolution, mutational signatures, and predicted neoantigens. Results The majority of CRC models showed remarkably stable mutational and neoantigen profiles; however, those carrying defects in DNA repair genes continuously diversified. Rapidly evolving and evolutionary stable CRCs displayed characteristic genomic signatures and transcriptional profiles. Downregulation of molecules implicated in antigen presentation occurred selectively in highly mutated and rapidly evolving CRC. Conclusions These results indicate that CRCs carrying alterations in DNA repair pathways display dynamic neoantigen patterns that fluctuate over time. We define CRC subsets characterized by slow and fast evolvability and link this phenotype to downregulation of antigen-presenting cellular mechanisms. Longitudinal monitoring of the neoantigen landscape could be relevant in the context of precision medicine. Electronic supplementary material The online version of this article (10.1186/s13073-019-0654-6) contains supplementary material, which is available to authorized users.


Background
Anticancer therapies based on immune-checkpoint blockade are often remarkably effective but benefit only a minor fraction of cancer patients [1]. Several biomarkers of response and resistance to immune modulators have been proposed [2,3]. Among these, the overall mutational burden (number of somatic variants per megabase (Mb)) and the number of predicted neoantigens were highlighted in multiple studies [4][5][6]. The predictive values of mutational and antigen burdens are still being evaluated in clinical settings. Both parameters are presently assessed on DNA extracted from individual tissue samples and are typically measured only once in the clinical history of each patient. Alterations in DNA repair pathways, including mutations or promoter hypermethylation of mismatch repair (MMR) effectors (MLH1, MSH2, etc.) or DNA polymerases (polymerase ε and δ) [7], are known to increase the mutational burden and the neoantigen profiles of cancers [8]. Whether, and to what extent, neoantigen profiles evolve over time as a result of the inherent genomic instability of individual tumors is largely unknown. We recently reported that in mouse models, inactivation of DNA mismatch repair increases the mutational burden and leads to dynamic mutational profiles resulting in effective cancer immune response [9]. Here we exploit CRCs as a model system to understand whether mutational burden and neoantigen profile of human tumors evolve over time as a result of their distinctive genomic landscapes.

CRC cell lines
The source of each cell line is reported in Table 1. All cell lines were maintained in their original culturing conditions according to supplier guidelines. Cells were ordinarily supplemented with FBS 10%, 2 mM L-glutamine, and antibiotics (100 U/mL penicillin and 100 mg/mL streptomycin) and grown in a 37°C and 5% CO 2 air incubator. To study the evolution of cell populations, cell lines were not cloned prior to the experiment or at any subsequent time point. Cell lines were thawed in a 10-cm dish. After thaw recovery, each cell line was screened for the absence of mycoplasma contamination and checked for its identity, referred below as quality control (QC). To preserve heterogeneity, upon thawing, individual lines were expanded to at least 10 8 cells. At this point for each model, cells were counted, and the percentage of alive/dead cells was calculated. At the beginning of the experiment (T0), 4 × 10 7 live cells were distributed as follows: (A) 2 × 10 6 cells were re-plated in a 10-cm dish for in vitro propagation, (B) 3 × 10 7 cells were used for in vivo experiments, (C) 2 × 10 6 cells were frozen, and (D) 3 pellets (2 × 10 6 cells each) were frozen for DNA, RNA, and protein extraction. Cells plated as in (A) were kept in culture changing medium twice a week and dividing them at a constant splitting rate, determined before initiating the experiment. In details, splitting was performed before full confluency was achieved. The number of cells that were split and the number of passages and days of culture were recorded for each cell model to calculate the doubling time. During in vitro culture, cell populations were collected at the following pre-determined time points: 30 days (T30), 60 days (T60) and 90 days (T90) from T0. At each time point, a fraction of the cells were put aside (note that this did not affect the rate of passaging described below) and pellets (2 × 10 6 each) were collected for DNA, RNA, and protein extraction. QC was repeated at each time point.

Microsatellite instability (MSI) status
The MSI status was assessed with the MSI Analysis System kit (Promega). The analysis requires a multiplex amplification of seven markers including five mononucleotide repeat markers (BAT-25, BAT-26, NR-21, NR-24, and MONO-27) and two pentanucleotide repeat markers (Penta C and Penta D). The products were analyzed by capillary electrophoresis in a single injection (3730 DNA Analyzer, ABI capillary electrophoresis system (Applied Biosystems). Then, the results were analyzed using GeneMapper V5.0 software.

DNA extraction and exome sequencing
Genomic DNA (gDNA) was extracted from CRC cell lines, xenografts, and PDXs using Maxwell® RSC Blood DNA kit (AS1400, Promega). DNA was sent to IntegraGen SA (Evry, France) that performed library preparation, exome capture, sequencing, and data demultiplexing. Final DNA libraries were pair-end sequenced on Illumina HiSeq4000 as pairedend 100 bp reads.

Mutational analysis in cell lines
When cell lines were passaged in mice or when analyzing patient-derived xenografts, Fastq files were first processed with Xenome [10] to remove reads of mouse origin. Reads files were aligned to the human reference hg38 using BWA-mem algorithm [11], and then the "rmdup" samtools command was used to remove PCR duplicates [12]. On the resulting aligned files, we observed a median depth of 138x with 98% of the targeted region covered by at least one read. Bioinformatic modules previously developed [9,13] by our laboratory were used to identify single nucleotide variants (SNVs) and indels. The mutational characterization of the 64 cell lines at time point 0 was assessed by calling the alterations against the hg38 reference annotation. Then, a series of filters were used to remove germline variants and artifacts: alleles supported by only reads with the same strand, excluding start and end read positions from the count, were discarded; variants called with allelic frequency lower than 10% as well a p value greater than 0.05 (binomial test calculated on allele count and depth of each sample) were excluded; common dbSNP version 147 and a panel of normal (40 samples) from previous sequencing were used to annotate and filter germline variants and sequencing artifacts. The variant calls of 45 cell lines at time point 90 and the 18 cell lines explanted from mice were performed using the allele comparison strategy between the same cell line at time 0 and time  [9,14]. Briefly, RNAseq data were used as input of "OptitypePipeline" [15] to assess the HLA status of each sample at time point 0, then NetMHC 4.0 software [16] was employed to analyze mutated peptides derived from variant calls using kmer of 8-11 length. Next, for each SNV, we modified the corresponding cDNA in the selected position and we examined the 5′ and 3′ context. The latter was set taking into account the length (in terms of amino acids) with which the putative antigen could bind HLA. We translated the cDNA and feed mutant peptide to NetMHC with the proper HLA(s). For frameshifts, we applied the same approach considering every possible peptide generated by the new frame. Finally, RNAseq data were used to annotate and then filter according to expression values (fragments per kilobase million (FPKM) > 10). Only predicted neoantigens with a strong binding affinity (Rank < 0.5) were considered for further analysis.

Mutational analysis of patient-derived xenograft
WES of patient-derived xenografts was performed at IntegraGen SA (Evry, France). Sequenced samples included a microsatellite stable (MSS), a microsatellite unstable (MSI), and a POLE mutant case (5, 7, and 6 respectively). Samples were analyzed with the same bioinformatic pipeline applied to cell lines, and murine reads were first removed using Xenome [10]. A median depth of 130x and with 98% of the targeted region covered by at least one read was observed. All 18 PDX samples were characterized by calling alterations against the hg38 reference annotation. For each generation, with the exception of the first one, the mutational evolution was inferred by subtracting the mutations of the previous generation. Second-generation samples were compared to the first-generation samples, samples from the third generation were compared to the 2nd generation samples, and so on.

Ploidy estimation
Gene copy-number (GCN) was calculated in a two-step approach: initially, we treated cell lines as diploid and considered the median read depth of all coding regions as the level for 2N ploidy. We also calculated the median read depth for every gene. The ratio between the two median values was then considered as the relative GCN.
In the second step, to estimate the overall ploidy, we segmented all chromosomes using a custom script that implements circular binary segmentation. Finally, we exploited the distribution of allelic frequencies for individual segments to assess the absolute GCN. This was necessary since distinct ploidy levels have different expected distributions. For example, a 2N ploidy status has a bell-shaped curve with a peak of 50% and a 3N ploidy is expected to have two peaks on 33% and 66%.

Mutational signature
Mutational signatures were calculated using the web application "Mutational Signatures in Cancer" (MuSiCa) [17]. The profile of each signature is calculated using the six substitution subtypes: C>A, C>G, C>T, T>A, T>C, and T>G (all substitutions are referred to by the pyrimidine of the mutated Watson-Crick base pair). Information on nucleotides 5′ and 3′ to each mutated base are incorporated to generate 96 possible mutation types. For each sample, a tab-separated value file was created with chromosome, position, reference, and alternate alleles. Only samples with at least 10 mutations were included. The output file of MuSiCa that includes the contribution values of 30 signatures [18] was used to create a clustermap with seaborn, a Python data visualization library, setting Euclidean metric and the average linkage method.

Doubling time
Cell lines were passaged in vitro for a minimum of 85 to a maximum of 103 days. Each passage was performed before full confluency was achieved, and the total number of doublings was annotated for each cell model. Two parameters, number of passages (n) and days of culture (t), were used to estimate the growth rate (GR) and the doubling time (DT) assuming that every division is an independent random event; probability distribution of division is equal for all cells and it is an exponential distribution; and the number of cells in each plate before confluence is fixed (K). The growth rate is defined as GR = log n (2) ÷ DT [19]. The estimated number of cells at time t is defined as is the number of cells at time 0. Therefore, GR = log n (N(t) ÷ N(0)) ÷ t where N(t) ÷ N(0) = (K × 2 n ) ÷ (K × 2 0 ) = 2 n and so GR = log n (2 n ) ÷ t. Finally, DT = t × log n (2) ÷ log n (2 n ).

RNA extraction and RNAseq analysis
Total RNA was extracted from a pellet of CRC cells (2 × 10 6 cells) using Maxwell® RSC miRNA Tissue Kit (AS1460, Promega), according to the manufacturer's protocol. The quantification of RNA was performed by Thermo Scientific Nanodrop 1000 (Agilent) and Qubit 3.0 Fluorometer (Life Technologies). RNA integrity was evaluated with the Agilent 2100 Bioanalyzer using the Agilent RNA 6000 Nano Kit. Total RNA (800 ng) with RNA integrity number (RIN) score between 9 and 10 was used as input to the Illumina TruSeq RNA Sample Prep Kit v2-Set B (48Rxn), according to the manufacturer's protocol. The standard RNA fragmentation profile was used (94°C for 8 min for the TruSeq RNA Sample Prep Kit). PCR-amplified RNAseq library quality was assessed using the Agilent DNA 1000 kit on the Agilent 2100 BioAnalyzer and quantified using Qubit 3.0 Fluorometer (Life Technologies). Libraries were diluted to 10 nM using Tris-HCl (10 mM pH 8.5) and then pooled together. Diluted pools were denatured according to the standard Illumina protocol, and 1.8 pM were run on NextSeq500 using high output Reagent cartridge V2 for 150 cycles. A single-read 150-cycle run was performed. FastQ files produced by Illumina NextSeq500 were aligned using MapSplice2 [20] transcriptome-aware aligner using hg38 assembly as reference genome. The resulting BAM files were post-processed to translate genomic coordinates to transcriptomic ones and to filter out alignments carrying insertions or deletions (which RSEM does not support) or falling outside the transcriptome regions. The postprocessed BAM alignment was given as input to RSEM [21] for gene expression quantification using GENCODE v22 as gene annotation.

Differential expression analysis
The abundance quantification generated with RSEM provides the FPKM and the expected counts for each gene. The latter was used to perform genes differential expression analysis with DESeq2 R package (library Bioconductor) [22] given two distinct groups of interest, one of which considered as the reference. Genes were considered as differentially expressed if the adjusted p value was less than 0.05, and the log2 fold change was less or equal to −1 (if median FPKM value of the reference group was greater or equal to 10), or the log2 fold change was greater or equal to 1 (if median FPKM of the target group was greater or equal to 10). The analyses were performed between the following groups: MSI vs MSS (reference), hypermutated vs non-hypermutated (reference), and "EVOLVING-CRC" vs "STABLE-CRC" (reference). The hypermutated group included MSI and MSS POLEmutated cell lines (18 samples). EVOLVING-CRC group included all samples with at least 10 alterations acquired per day. A multi-factor configuration of the expression analysis was designed including extra variables of interest such as growth rates or the number of mutations normalized to doubling time.

Pathway analysis
Genes differentially expressed were then analyzed with g:Profiler [23], an online pathway analysis tool that takes a list of genes and assigns them to different families of biological functions. We set the query options to select significant biological processes only, and we retained (for further analysis) only the topmost families of the hierarchy (depth 1).

Xenograft mouse model
Each CRC cell line (5 × 10 6 cells) was injected subcutaneously into both flanks of two 6-week-old female NOD (nonobese diabetic)/SCID (severe combined immunodeficient) mice (Charles River Laboratory). Tumor size was measured twice a week and calculated using the formula: V = ((d) 2 × (D)) ÷ 2 (d = minor tumor axis; D = major tumor axis). Tumors were explanted when they reached a volume of 1000 mm 3 . The investigators were not blinded, and measurements were acquired before the identification of the cages.
Patient-derived mouse model mice as described previously [24]. When reaching a volume of 1500-2000 mm 3 , the tumors were explanted, fragmented, and serially passaged in new mice. At each passage, part of the material was frozen for molecular analyses. Samples' genetic identity was determined by Sequenom-based analysis of 24 highly variable SNPs of germline DNA (

Results
We selected from our database 64 CRC cell lines designed to recapitulate clinically relevant characteristics of CRC patients (Table 1 and Additional file 1: Figure S1a). Whole exome sequencing and RNAseq were performed on all models. Using previously developed computational tools and bioinformatic algorithms [13,14,25,26], we measured mutational burden (alterations per Mb) assessing both SNVs and frameshifts (Fig. 1a, b, Additional file 2). Scrutiny of genomic alterations highlighted that MSI cell lines and those carrying known POLE hotspot mutations had higher number of mutations per Mb as compared to MSS cell lines (Fig. 1a). The type of DNA repair alterations occurring in each model affected the nature of mutations: MSI cells displayed a higher number of frameshifts and indels than POLE mutant cell lines; the opposite was true for SNVs (Fig. 1c, d). Alterations in MMR and POLE genes are listed in Table 3 and Additional file 1: Figure S1b. The cell line with the highest number of variants (SNU1040) carried inactivating alterations in both MLH1 and POLE (Additional file 1: Figure S1b). Altogether, these results are consistent with what has been reported in CRC patients carrying alterations in the MMR DNA repair pathway, indicating that the cell models included in this study broadly recapitulate what is observed in clinical specimens [27].
To assess whether, and to what extent the basal mutational profiles (Time 0: T0) evolved over time, we passaged 45 cell lines for 90 days and collected a second set of samples (Time 90: T90) (Additional file 1: Figure S2). These were subjected to WES and analyzed using the computational pipeline described above. Across all cell lines globally, the total mutational burden was similar between T0 and T90 (Additional file 1: Figure S3). However, when the T0 and T90 mutational profiles were compared, prominent differences were detected among models sharing specific DNA repair defects (Fig. 2a). Specifically, the mutational landscapes of most MSI and POLE mutant cells evolved very rapidly through the generation of novel SNVs and frameshifts (Fig. 2a). On the contrary, the majority of MSS models showed more stable profiles (Fig. 2a). We sought to minimize confounding effects due to differences in cell-intrinsic doubling times (Table 1); we therefore calculated the doubling time of all cell models (Table 1, Additional file 1: Figure S4). Notably, evolvability trends remained apparent after normalization for doubling time (Additional file 1: Figure S5). We designated rapidly evolving CRC cells as EVOLVING-CRC and evolutionary stable CRC cell as STABLE-CRC (Table 1).
We empirically define EVOLVING-CRCs as those cells that acquire 10 alterations (or more) per day after normalizing mutation data to the doubling time of cell lines (Table 1). Moreover, EVOLVING-CRCs often carried alterations in multiple genes involved in distinct DNA repair functions, suggesting that defects in several DNA damage response pathways might be co-selected (Additional file 1: Figure S1b). The expression of MMR genes was assessed by western blot at T0 and T90, and no differences were observed (Additional file 1: Figure S6).
The genome of four CRC lines classified as MSS (SNU1235, COCM1, HDC142, and SNU1411) exhibited dynamic mutational profiles (Fig. 2). In an attempt to decipher the molecular basis of these findings, whole exome data of the outliers were carefully examined, focusing on genes previously implicated in DNA repair pathways that are not routinely subjected to scrutiny in CRC patients. We found that SNU1235 and HDC142 models carried biallelic alterations in the EXO1 (S510*) and MUTYH (S179C) genes, respectively. The exonuclease EXO1 is implicated in both MMR (it binds MLH1) and base excision repair [28], while MUTYH encodes a DNA glycosylase that is involved in oxidative DNA damage repair and is part of the base excision repair pathway [29]. Germline mutations in MUTYH cause MUTYH-associated polyposis (MAP) [30]. Scrutiny of the COCM1 exome revealed a POLE variant (A629D). A629 is localized in a region of POLE highly conserved during evolution (Additional file 1: Figure S7). The A629D change is potentially damaging according to the SIFT [31] and Polyphen [32] algorithms, which predict the putative impact of amino acid substitutions on human proteins using structural and comparative evolutionary considerations.
We next addressed how longitudinal evolution of CRC cell genomes affected their predicted neoantigen profile. To this end, WES, RNAseq, and HLA prediction data were combined as previously described [9]. In detail, we identified genomic variants that satisfied three criteria: (i) emerged over time, (ii) occurred in transcribed genes, and (iii) scored positively when HLA I matching algorithms were applied. The variants that emerged after deploying the above computational pipeline were classified as putative neoantigens (Fig. 2b). Hypermutated and EVOLVING-CRC cells displayed higher levels of putative neoantigens compared to slowly evolving CRC cells (Fig. 2b). Moreover, and consistent with their predicted effects on antigenicity, a high prevalence of indels and associated frameshifts, Mutations were called against the reference genome (hg38) with allelic frequency > 1. The y-axis reports all the mutations found in each cell line, whereas the time points data are reported on x-axis which occur in MSI CRCs, translated into higher numbers of predicted neoantigens in this subset (Fig. 2b).
Next, we studied whether in parallel to mutation gains we could also detect loss of variants over time. For this reason, we tracked lost and gained alteration in "evolving" cell lines over time. As expected, variants that did not change over time showed high allelic frequency, likely reflecting their clonal (trunk) status. Mutations that emerged or were lost showed lower allelic frequency (Fig. 3).
Mutational signatures are characteristic combinations of mutation types arising from mutagenesis processes such as alterations in DNA replication, exposure to DNA damaging agents, tissue culture conditions, and DNA enzymatic editing [18]. In human tumors, over 30 mutational signatures have been identified, a subset of which are linked to defective DNA repair pathways. For example, signatures 6, 15, 20, and 26 are associated with MMR defects and signature 10 is linked to inactivating mutation in the proofreading domain of DNA polymerases, while signature 18 appears to explain the rise of 8-oxoG:A mismatches due to MUTYH biallelic alteration [33].
We reasoned that the remarkable evolvability observed in a subset of CRC cells might be reflected in their mutational signatures. To test this, we first identified mutational signatures at T0. As expected, MSI cells displayed signatures 6, 15, 20, and 26, while POLE mutant cells showed primarily mutational signature 10 (Additional file 1: Figure S8).
We next assessed which signatures were acquired (remained active) during replication of the cells in vitro by comparing samples collected at T0 and T90. We found that in most instances, DNA alterations linked to MMR and POLE defects continued to occur over time, indicating that the corresponding DNA repair capabilities were permanently disabled (Fig. 4a).
Replication of cancer cell populations in 2D is thought to encounter little or no selective pressure as the cells are cultured in the same conditions for many generations before the experiment is started. To monitor mutational and neoantigen evolution under more stressful (selective) conditions, CRC cells including MSS, MSI, and POLE models were transplanted in immunodeficient (NOD SCID) mice and allowed to grow until they reached approximately 1000 mm 3 in size, after which tumors were excised. Although NOD SCID mice have no adaptive immunity, the mouse stromal microenvironment and elements of cellular innate immunity are known to affect the growth of human cancer cells in vivo [34]. DNA samples were obtained before implantation and at the end of the experiment. WES was performed, and the data were analyzed with the same bioinformatic pipeline applied to cells grown in vitro. The mutational profiles revealed higher evolutionary rates in vivo than in vitro (Additional file 1: Figure S9a, b). This translated into increased levels of predicted neoantigens in vivo (Additional file 1: Figure S9c). Notably, mutational signatures linked to MSI status and POLE mutations were more marked in vivo than in vitro (Fig. 4b, Additional file 1: Figure S10). We then assessed whether the mouse microenvironment exerts selection on the cells expanded in vivo and compared the results to cells passaged in vitro. To this end, we characterized the ratio between non-synonymous and synonymous mutations in vitro and in vivo. We detected very limited or no selection in cells passaged in vitro (ratio 3:1). Instead, in vivo the ratios for lost and gained mutations were 1:1 and 2:1, respectively, indicating a purifying selection (Additional file 1: Figure S11). These findings suggest that when cells are transplanted in mice they are subjected to environmental selection. Next, we asked whether the evolutionary trajectories observed in CRC cells with alterations in DNA repair pathways also occurred in human CRC with analogous molecular profiles. To this end, we selected MMRproficient, MMR-deficient, and POLE mutant cases (Table 4) from our extensive patient-derived CRC xenograft biobank [35]. Each model was serially transplanted for at least four generations in immunodeficient mice as described in the phylogenetic tree (Fig. 5a). Samples collected at each transplantation were subjected to WES. In some instances, simultaneous transplantation of the Table 3 POLE mutations in CRC cells same tumor in two animals allowed acquisition of independent measurements for each generation. NGS data were analyzed with the bioinformatic pipeline applied to cells grown in vitro. These experiments revealed remarkable differences in the evolvability of MSS, MSI, and POLE CRC models in vivo and indicated that these characteristics also occurred in patient-derived CRC samples (Fig. 5b, c). As expected, high-frequency (clonal-trunk) variants were conserved across generations. Interestingly, the in vivo results differ from those obtained in cell models in vitro. We find that in PDX models, not only sub-clonal but also clonal populations can emerge in the subsequent generation of colorectal cancers with DNA repair defects (Fig. 6).
Furthermore, in MSI and POLE patient-derived xenografts, the mutational signatures were continuously (re) generated and could be clearly recognized (Additional file 1: Figures S12 and S13). In non-mutator (slow evolving) cell lines, very few mutations emerged over time, and thus, the possibility to assess mutational signatures was limited. Because of this, in the slowly evolving models, we were unable to reliably generate mutational signatures.
Distinct subsets of CRCs can be recognized based on histological characteristics, as well as their genomic, epigenetic, and transcriptional profiles. As a result, CRC can be classified into specific subsets, which are often correlated with divergent clinical outcomes [36,37]. The rate of genomic evolution and the dynamics of neoantigen profile have not yet been systematically explored as a method to classify CRC. We therefore asked whether any molecular traits (beyond alterations in DNA repair genes) could distinguish EVOLVING-CRC and STABLE-CRC. To address this question, we performed unbiased gene copy number and transcriptional comparative analyses of CRC cell lines. As previously reported, MSI CRC cells a b c Fig. 5 Genomic evolution in patient-derived xenografts. Phylogeny of the indicated patient-derived xenograft and their molecular characterization. a MSS, MSI, and POLE mutant samples were serially transplanted for at least four generations (F1-F4) in NOD/SCID mice as shown. Samples collected at each passage were subjected to WES. b WES data of each generation were compared with those obtained from the previous generation. Bar graphs show de novo acquired SNVs and frameshifts at each generation. c The number of predicted neoantigens in each PDX is shown. Each bar represents putative neoepitopes derived from SNVs and frameshifts (see the "Methods" section for detailed information) typically carried a close to diploid chromosomal status, while MSS showed elevated aneuploidy (Fig. 7) [38]. Interestingly, the most rapidly evolving POLE mutant lines, SNU81 and HDC114, also displayed a diploid prevalent phenotype. Nonetheless, copy number and ploidy status could not distinguish "EVOLVING" and "STABLE" CRC models.
Next, we performed RNAseq on the entire dataset to explore whether transcriptional profiles could classify rapidly evolving CRC lines. Differential analysis of RNAseq data was initially performed comparing the MSS and MSI sample groups. The list of differentially expressed genes was consistent to results previously reported in this setting, and 168 genes were differentially expressed between these two groups (Table 5) [39]. Next, we evaluated genes differentially expressed in hypermutated versus non-hypermutated cells, grouping together MSI-and POLE-mutated cell lines and comparing them to the MSS lines (Fig. 8a). Notably, proteins associated with immune response and predominantly with antigen-presenting and antigen recognition functions were consistently downregulated in cell lines with high mutational burden (Fig. 8b). Next, we compared EVOLVING and STABLE CRC models. The number of genes differentially expressed with significant p value was smaller due to the reduced number of available samples (Fig. 9a). Beta-2 microglobulin (B2M) was downregulated in most EVOLVING as compared to STABLE CRCs (Fig. 9b, c). Downregulation of B2M was confirmed at the protein level (Fig. 9c) and was frequently associated with premature stop codons in the B2M gene (Fig. 9d). Interestingly, the four MSS models (COCM1, SNU1235, SNU1411, and HDC142) with low mutational burden but dynamic mutational profile also displayed low levels of B2M (Fig.9b, c). Comparison of EVOLVING and STABLE CRC models pinpointed other genes differentially expressed including CPNE1, IRF1, and PMSB10. These genes are also involved in immune-related processes and their downregulation might similarly reduce immune surveillance of EVOLVING CRCs (Fig. 9a and Additional file 1: Figure S14). We next performed the analysis showed in Fig. 9a in a multivariate fashion Fig. 6 Lost and gained mutations across the indicated PDX generations. The color code defines allelic frequencies of acquired SNVs at each generation (with allelic frequency > 1). The y-axis lists all SNVs identified in each branch; the mouse generation (genealogy) is reported on the x-axis taking into account the growth rates of the cells or the number of mutations normalized to the doubling time. The number of statistically significant genes in the multivariate analyses (Additional file 1: Figure  S15) was lower but consistent with the findings of Fig. 9a. In the future, it would be interesting to assess whether the differential expression of genes in fast evolving CRC models has a functional impact. This aspect cannot be causally predicted at this stage.

Discussion
In the past decade, it has become clear that most human tumors are highly molecularly heterogeneous, and this affects prognosis and the emergence of therapeutic resistance [40]. How tumor-specific somatic variations can lead to distinct neoantigen profiles and ultimately to immune surveillance has also been partially elucidated. The number of neoantigens depends on several factors. For example, lung cancers associated with smoking habits have high levels of mutations [41,42], whereas the development of skin melanomas is correlated with UV lightmediated mutagenicity [43]. Both smoking and UV exposure occur during defined periods and their mutagenicity is transient, leading to high-but relatively stable-mutational profiles [44,45]. Another class of tumors with high mutational burden is characterized not by exposure to external carcinogens, but rather by the intrinsic inability of tumor cells to efficiently repair DNA. The latter is due to epigenetic or genetic alterations in key effectors of  The longitudinal analysis of cell and PDX models highlighted several aspects. For example, MSI-and POLE-mutated tumors tended to acquire SNV or short insertions/deletions over time. These alterations can lead to novel putative neoantigens which potentially trigger the host immune system. In addition to well-known DDR genes (MLH1, MSH2, MSH6, PMS2, POLE), our study indicate that other genes involved in DNA repair pathway may lead to accumulations of mutations possibly translating in novel epitopes. EXO1 and MUTYH are two of such examples. Profiling of these genes in the clinical setting may help to intercept tumors not classified as unstable or with hypermutator phenotype but nevertheless continuously evolving and accumulating mutations. Our analysis suggests that in parallel to mutation gains, loss of variants also occurs during cell propagation. Our data indicate that in hypermutated CRCs, including MSIand POLE-mutated models expanded in vitro, these events are mainly confined to subclones. A limitation of this study is that longitudinal characterization of lost and gained mutations in vitro could be influenced by sampling of cell populations during cell passaging. We also report that in the propagation of PDXs, possibly due to selection imposed by the microenvironment, not only subclonal but also clonal variants emerge de novo over time. Based on these results, we speculate that in CRC patients with DNA repair defects metastatic seeding or therapeutic debulking can lead to the emergence of new subsets of clonal neoantigens. This could have implications for the development of therapies relying on the presence of clonal neoantigens, such as ICP, CAR-T, and vaccines.
Both cell lines and PDXs have been widely employed to test anticancer compounds [46][47][48]; however, experimental reproducibility has occasionally been questioned [49,50]. The molecular evolvability that we find to occur during serial passaging of cells and PDXs may partly account for the discrepant results obtained with these models [51][52][53].
A limitation of the present study is that it examined the evolution of cell lines and xenografts but cannot address the impact of the immune system in the evolutionary dynamics due to intrinsic limitations of the models we used.
Our data indicate that alterations in DNA repair genes facilitate the acquisition of neoantigens. These novel putative epitopes can be recognized by the immune system. Accordingly, we confirm that CRCs with high number of mutations (hypermutated CRCs) selectively downregulate components of the neoantigen presentation process, such as B2M, thus restricting the ability of the host immune system to detect them. Our results further suggest that non-hypermutated CRCs, that display fast evolving mutational and antigen profiles, also show downregulation of components implicated in neoantigen presentation. The differences in expression of molecules involved in immune functions we observed in the CRC models could have originated a b c d