Clonal evolution in primary breast cancers under sequential epirubicin and docetaxel monotherapy

Subclonal evolution during primary breast cancer treatment is largely unexplored. We aimed to assess the dynamic changes in subclonal composition of treatment-naïve breast cancers during neoadjuvant chemotherapy. We performed whole exome sequencing of tumor biopsies collected before, at therapy switch, and after treatment with sequential epirubicin and docetaxel monotherapy in 51 out of 109 patients with primary breast cancer, who were included in a prospectively registered, neoadjuvant single-arm phase II trial. There was a profound and differential redistribution of subclones during epirubicin and docetaxel treatment, regardless of therapy response. While truncal mutations and main subclones persisted, smaller subclones frequently appeared or disappeared. Reassessment of raw data, beyond formal mutation calling, indicated that the majority of subclones seemingly appearing during treatment were in fact present in pretreatment breast cancers, below conventional detection limits. Likewise, subclones which seemingly disappeared were still present, below detection limits, in most cases where tumor tissue remained. Tumor mutational burden (TMB) dropped during neoadjuvant therapy, and copy number analysis demonstrated specific genomic regions to be systematically lost or gained for each of the two chemotherapeutics. Sequential epirubicin and docetaxel monotherapy caused profound redistribution of smaller subclones in primary breast cancer, while early truncal mutations and major subclones generally persisted through treatment. ClinicalTrials.gov, NCT00496795, registered on July 4, 2007.

A better understanding of subclonal redistribution and identification of emerging or disappearing genomic aberrations during treatment would pave the way for identification of molecular mechanisms underlying resistance to the treatment applied and pinpoint potential targets to improve therapeutic outcome. Neoadjuvant trials present an ideal setting to explore molecular aberrations dictating response to chemotherapy, as well as subclonal redistribution, in early breast cancer. Here, therapy response may be recorded in detail clinically, or radiologically using magnetic resonance imaging (MRI). Moreover, for patients with larger tumors, repeated biopsies can be collected before and after treatment with each particular regimen [12][13][14].
Anthracyclines and taxanes are the two types of chemotherapy most frequently applied in breast cancer. The lack of cross-resistance between these compounds provides a rationale for sequential monotherapy instead of combination regimens, thereby allowing the application of higher doses, including dose-dense treatment, for each compound [15][16][17]. Importantly, a recent meta-analysis demonstrated sequential therapy to be superior to or at least as good as combined treatment [15]. Moreover, a monotherapy design allows identification of factors predicting outcome to individual compounds [18], contrasting combination regimens for which factors predicting sensitivity to individual compounds may not be identified.
In order to explore the efficacy of sequential dosedense epirubicin and docetaxel monotherapy and to identify potential predictive biomarkers to guide treatment selection, we conducted a phase II clinical trial in 109 patients with large, treatment-naïve primary breast cancers. We performed whole exome sequencing (WES) of biopsies collected before and after epirubicin and docetaxel treatment in a subset of patients (n = 51) selected to balance the number of responders and non-responders to each regimen (see the "Methods" section). Furthermore, out of the remaining tumors (n = 58), 45 pretreatment biopsies underwent targeted sequencing of six key breast cancer-related genes. While we detected no specific genomic alteration predicting response to therapy, we recorded a profound redistribution of cancer subclones during treatment with both drugs. Notably, for most of the investigated tumors, subclones appearing or disappearing could be identified as small subclones below conventional detection limits before and after treatment, respectively.

Patients and study protocol
The Dose-Dense Protocol (DDP) study was a prospectively registered, single-arm, phase II clinical trial which enrolled patients ≤ 65 years, with non-inflammatory, primary breast cancer and tumor size > 4 cm and/or N2-3 lymph node metastases. Patients were enrolled on an all-comers basis, provided eligibility according to the inclusion criteria and upon each patient's informed consent. Enrollment was performed at a single institution in Bergen, Norway (for details, see Additional file 1). The trial was approved by the Regional Ethics Committee of Western Norway and registered at ClinicalTrials.org (NCT00496795) prior to trial initiation. The first patient was included on November 19, 2007, and the last patient entered the trial on February 11, 2016. Patient inclusion has been finalized, but survival follow-up continues for 10 years postoperatively (i.e., until February 11, 2026).
The study protocol (Additional file 2) dictated four courses of i.v. epirubicin 60 mg/m 2 q2w followed by four courses of docetaxel 100 mg/m 2 q2w (Fig. 1a) prior to surgery, including pegfilgrastim after each chemotherapy course. Patients who did not tolerate docetaxel Fig. 1 a Dose-Dense trial design. All patients (n = 109) received sequential epirubicin and docetaxel monotherapy*. Biopsies were collected before treatment, after epirubicin, and after docetaxel. Whole exome sequencing (WES) was performed for 51 patients in pretreatment tumor biopsies and matched blood samples. Among these 51 patients, repeated tumor biopsies after epirubicin (n = 48) and after docetaxel (n = 43) were subjected to WES (all 43 patients with biopsies after docetaxel were among the 48 with biopsies after epirubicin). Out of the remaining 58 patients in the trial, pretreatment tumor biopsies from 45 underwent amplicon-based targeted sequencing of a six-gene panel. b Diagram depicting the number of tumor biopsies used for DNA sequencing pretreatment (green), after epirubicin (red), and after docetaxel (blue). Out of 96 pretreatment biopsies, WES was performed on 51 (dark gray) and amplicon-based sequencing on the remaining 45 samples (light gray), whereas subsequent analyses after epirubicin (n = 48) and docetaxel treatment (n = 43) were performed by WES only. The distribution of breast cancer subgroups (hormone receptor positive, HER2 normal (HR+HER2−), HER2 positive (HER2+), and triple-negative breast cancers (TNBC)) are indicated by light green, purple, and pink bars, respectively, in the lower panel. c Mutation status pretreatment. Oncoplot showing mutations of six genes in pretreatment samples from all patients in the study with available tumor DNA (n = 96). The mutation list is sorted by the subgroups; HR+/HER2−, HER2+, and TNBC. Mutations are colored according to mutation type. Percentages and bars on the right indicate the prevalence of mutations in each of the genes, among the 96 tumor samples analyzed. Each column represents one tumor/patient. Colors in the lower panel show the individual clinical responses to sequential epirubicin and docetaxel* neoadjuvant chemotherapy, and the molecular analysis used for each tumor sample (whole exome sequencing (WES) or amplicon-based sequencing). Responses listed: CR (complete response), PR (partial response), SD (stable disease), according to RECIST, and PD (progressive disease), according to UICC criteria. *HER2-positive breast cancers received concomitant docetaxel and trastuzumab (See figure on next page.) treatment were shifted to paclitaxel (n = 3), 80 mg/m 2 qW for the remaining part of the 8-week taxane period. Patients with HER2-positive disease received weekly i.v. trastuzumab (4 mg/kg loading dose, then maintenance 2 mg/kg) during neoadjuvant taxane treatment, as well as postoperatively, to a total treatment duration of 52 weeks. All patients received postoperative radiotherapy to the chest wall and regional lymph nodes, and adjuvant endocrine therapy (premenopausal women: tamoxifen, postmenopausal women: aromatase inhibitor (AI) or sequential AI-tamoxifen) was given for 5 years to patients with hormone receptor (HR)-positive disease, in accordance with national guidelines.
All tissue samples in the present study were collected as dictated by the DDP trial protocol. Prior to commencing neoadjuvant treatment, all patients had an incisional breast tumor biopsy taken. Additionally, a Tru-cut biopsy was obtained, from the same site, after completing epirubicin treatment, and finally, tumor tissue, again from the same site, was sampled at surgery (Fig. 1a). All samples were snap-frozen in liquid nitrogen immediately upon removal. Leucocytes from pretreatment blood samples were used as a source for normal DNA.
Clinical and MRI responses to epirubicin and subsequent docetaxel treatment were classified according to the Response Evaluation Criteria in Solid Tumors (RECIST) [19], where patients were dichotomized into responders (complete response (CR) and partial response (PR), i.e., objective response) and non-responders (stable disease (SD) and progressive disease (PD)). Additionally, to assure early detection of progressive disease, the common UICC criteria were used, where PD was defined as an increase of ≥25% in the product of the largest and the perpendicular tumor diameter [20], as opposed to ≥20% increase in the largest tumor diameter by the RECIST criteria (see Additional file 1 for details). Follow-up was defined as the time period from trial inclusion until data cut-off date or death for each patient. No patient was lost to follow-up.
The trial's primary endpoint was to correlate molecular parameters to objective response to each of the two chemotherapy regimens applied. The secondary endpoints were (a) to correlate molecular parameters to relapse-free and overall survival and (b) to identify and explore characteristics of epithelial and mesenchymal stem cells isolated in tumor tissue and bone marrow. Apart from a previous case report of a patient with lethal pneumonitis [21], this is the first report of the primary and secondary endpoints of the trial. The last survival assessment was conducted in March 2021, when each patient had a minimum of 5 years of follow-up from inclusion in the trial or until the time of death. The secondary endpoint (b) regarding stem cell assessment is not addressed in the current report.

Massive parallel sequencing
The present analyses were performed on preoperative tumor biopsies from 109 patients included in the DDP trial (Fig. 1a, b; for details, see Additional file 1 and Additional file 3: Fig.  S1). Among these patients, response outliers (n = 51/109) were selected for whole exome sequencing (WES), based on objective response (OR) or lack of OR to epirubicin and/or docetaxel and availability of tumor tissue for DNA extraction, with a minimum of eight patients in each group (Table 1). Tumor DNA samples from eight patients with OR to both epirubicin and docetaxel, 18 patients without OR to either drug, and 25 patients with OR to one drug only were analyzed. Within this selection, 11 patients had a pathological complete response (pCR) ( Table 1).
For the purpose of longitudinal comparisons (assessment of subclonal evolution), WES was performed as described previously [22], using DNA from tumor biopsies collected pretreatment, after epirubicin, and after docetaxel, together with matched blood DNA. Among the 51 response outliers, WES was performed on DNA from 51 pretreatment biopsies, 48 biopsies collected after epirubicin (but before docetaxel), and 43 biopsies collected after sequential epirubicin and docetaxel, as well as on matched pretreatment blood samples (n = 51). Eleven post-treatment biopsies (three after epirubicin, eight after docetaxel) were not sequenced due to the lack  of tumor DNA (Fig. 1a, b). In brief, library preparation was performed using the Agilent SureSelectXT Human All Exon V5 kit, and the resulting library was sequenced on a HiSeq2500 (Illumina Inc, San Diego, CA, USA). For the remaining patients (n = 58/109), tissue for genomic analysis was available from 45 pretreatment tumor biopsies (Fig. 1a, b). To assess the potential predictive value of mutations in key breast cancer genes, pretreatment samples from these tumors were analyzed for mutations in BRCA1, PIK3CA, TP53, CDH1, GATA3, and TBX3 by amplicon-based targeted sequencing and the results were combined with pretreatment status of the same genes in the WES-analyzed tumors. The five former genes were selected on the basis of being key breast cancer genes, while TBX3 was selected due its potential role as a CDH1 regulator and due to its relatively high mutation frequency in breast cancer. Library preparation was performed with a custom-designed Accel-Amplicon panel (Swift Biosciences, Ann Arbor, MI, USA) and sequencing was performed on an illumina MiSeq (for details, see Additional file 1).

Data analyses Mutation calling
Sequence reads were aligned to the human genome (Build-UCSC hg19) using the BWA-MEM alignment algorithm [23]. Sample-wise sorting and duplicate marking were performed using Picard tools (http:// broad insti tute. github. io/ picard). Indel realignment and base quality recalibration were performed by GATK tools [24]. Somatic small variant identification on the matched tumor-normal sample pairs was performed using MuTect [25] (single nucleotide variant [SNV] detection) and Strelka [26] (SNV and small indel detection), using default parameters, and applying the intersect as the true positive for SNVs [22]. Variant allelic fraction (VAF) of at least 5% was required for further analyses. Functional annotation of SNVs and InDels was performed with ANNOVAR [27], and only amino acid changing mutations were included for further analyses.

Copy number analysis
Copy number profiling and assessment of tumor cell fraction were performed using the ASCAT algorithm [28].

Annotation of driver events
To identify likely driver events, we followed a similar approach as we have previously described for breast cancer data sets [3]. We used a two-step approach where we first selected the genes most likely to contribute to breast cancer oncogenesis and then, for each individual mutation within these genes, assessed any evidence indicating a role as a driver mutation (for details, see Additional file 1).

Estimation of subclonality
For a measure of the cellular prevalence of mutations, we calculated rVAF of each mutation as the ratio between observed and expected VAF, given local copy number state, tumor cell fraction, and estimated number of mutated alleles.
where nmut refers to the number of mutated alleles, ntot refers to the total copy number at the mutated locus, and ρ refers to the tumor cell fraction. Sample mutation clustering across samples collected at different time points for each patient (pretreatment, post-epirubicin, and postdocetaxel/after neoadjuvant treatment) was performed by the use of PyClone [31] and displayed in parallel coordinate and fish plots (Additional file 3: Fig. S9).

Graphics
All graphics were generated using R version 3.6.1 (http:// www.R-proje ct. org/). The "ggplot2()" function was used to generate coxcomb plots [32]. Other packages used were dplyr, data.table, and tidyverse. Time scape and copy number packages from bioconductor were used for the visualization of clonal evolution and copy number alterations, respectively [33,34].
Full details about the data analyses are given in Additional file 1.

Statistical analyses
Correlations between mutations and response to therapy were assessed by the Fisher exact test and trend test across the response groups (prop.trend.test function in R). Comparison of continuous variables between groups was performed by the Mann-Whitney rank test, while the Wilcoxon rank test was used for paired samples. Survival data were assessed by a Cox regression analysis, calculating hazard ratios for each parameter. For Kaplan-Meier plots, patient subgroups were compared by the log rank test (for further details, see Additional file 1). Genomic Identifications of Significant Targets in Cancer (GISTIC) 2.0 [35] was used to identify frequent focal-and armlevel amplifications and deletions.
Statistical analyses were performed using R, version 3.6.3, or the SPSS 26/PASW 17.0 software package. All p-values reported are two-tailed, and p < 0.05 was considered statistically significant.

Patient characteristics and clinical outcome
Out of 109 patients included in the trial, 97% (n = 106/109) completed epirubicin, whereas 88% (n = 95/108) completed docetaxel treatment per protocol (CONSORT diagram; Additional file 3: Fig. S1). Patient and tumor baseline characteristics are given in Additional file 4: Table S1 and Additional file 1. The median largest tumor diameter at inclusion was 57 mm (range 12-270 mm). Tumor responses to sequential epirubicin and docetaxel are summarized in Additional file 4: Table S3. Briefly, the clinical objective response rate (ORR) after completing sequential epirubicin and docetaxel was 71.6%, where initial epirubicin treatment yielded an ORR of 41.3%, followed by ORR 29.5% for docetaxel after epirubicin. In comparison, for breast MRI, the ORR was 91.7%, with ORR 36.2% for epirubicin and 76.8% for docetaxel, excluding patients where MRI exams were not performed as part of the evaluation of treatment response. Overall, the pCR rate was 15% (n = 16/107), with a pCR of 3% for HR positive, HER2−; 33% for HER2+ breast cancers; and 30% for triple-negative breast cancers (TNBC) (Additional file 4: Table S3). For further details regarding clinical results of the trial, see Additional file 1.
Each patient had a minimum of 5 years of followup from inclusion in the trial (median 111 months; range 61-160 months), or until the time of death. The survival outcome is depicted in Additional file 3 (Fig.  S2). For patients with M0 disease at inclusion and who underwent surgery, recurrences have been established in 25 out of 106 patients (24%). The median diseasefree survival was 95 months (range 1-156). Out of 20 patients (19%) who died during follow-up, 16 patients died due to breast cancer, one died due to the breast cancer treatment (see below), and three died from other causes, unrelated to breast cancer or breast cancer treatment. The median overall survival was 103 months (range 5-160).
Adverse events were retrospectively collected by review of the patients' hospital records and scored using Common Terminology Criteria for Adverse Events (CTCAE), version 4. Apart from one patient dying from a taxaneinduced pneumonitis [21], the side effects from the neoadjuvant chemotherapy was as expected, with hand-foot syndrome, peripheral neuropathy, and infectious complications being the most common grade 2-4 adverse events (for details, see Additional file 1 and Additional file 4: Table S2).

Massive parallel sequencing
The current molecular analyses are based on biopsies from 109 patients included in the DDP trial ( Fig. 1b; for details, see Table 1 and in Additional file 5: Table S4). For longitudinal assessment of subclonal composition, WES was performed on tumor biopsies from 51 response outliers: pretreatment biopsies (n = 51), post-epirubicin but pre-docetaxel biopsies (n = 48), and post-docetaxel biopsies (n = 43), as well as on matched blood samples (n = 51). Eleven post-treatment biopsies (three after epirubicin, eight after docetaxel) were not sequenced due to the lack of tumor DNA (Fig. 1a, b). The mean of mean target coverage for WES was 159× for tumor samples and 69× for blood samples (Additional file 3: Fig. S3).
Out of the remaining 58 patients, 45 pretreatment biopsies had sufficient DNA for analyses. To assess the predictive impact of mutations in key breast cancer genes, these samples were subject to amplicon-based targeted sequencing of a six-gene panel, resulting in a mean target coverage of 5094× for tumor samples (matched blood samples not analyzed; Additional file 3: Fig. S4).

Mutational spectrum in pretreatment samples
Six predefined, key breast cancer genes were assessed in all 96 pretreatment tumor samples, either by WES or by amplicon sequencing, based on their high mutational prevalence and/or predominant role in breast cancer progression [2,36,37]. This analysis yielded mutation frequencies of TP53 38%, PIK3CA 30%, CDH1 21%, GATA3 17%, BRCA1 7%, and TBX3 3% (Fig. 1c). Thus, the frequencies of CDH1 and BRCA1 mutations were slightly higher than in larger breast cancer cohorts [38], probably due to random variation within our sample set. Mutations in none of these selected key breast cancer genes were predictive of response to either epirubicin or subsequent docetaxel treatment (Additional file 4: Table S5).
In the 51 tumors subjected to WES, in addition to mutations in the genes described above, six tumors harbored PTEN mutations (12%), and five tumors harbored FLG and MUC16 mutations (each 10%). The mutational distribution across individual genes is depicted in Additional file 3: Fig. S5. When testing for predictive value across genes mutated in >2 patients, JAK2 mutations (n = 3/51) were associated with response to epirubicin treatment (p = 0.02); however, this did not remain significant after correction for multiple testing (Additional file 4: Table S6). No other mutations or sets of mutations in key breast cancer pathways examined by WES in pretreatment biopsies predicted response to epirubicin.
The tumor mutational burden (TMB) in pretreatment biopsies analyzed by WES varied from 0.02 to 8.38 mutations per megabase (MB) (median n = 0.64, mean n = 1.14). While the mean TMB in pretreatment biopsies was numerically lower in responders than non-responders (TMB n = 0.94 versus n = 1.32), this was not statistically significant. We found no correlation between CNAs and TMB (Additional file 3: Fig. S7).
Analyzing for somatic mutation signatures previously defined in breast cancer [39], no signature was predictive of response to epirubicin (Additional file 3: Fig. S8a).

Mutational spectrum after epirubicin treatment
Among the 51 patients selected for WES, 48 tumors were re-biopsied after epirubicin treatment and analyzed by WES. Within this subset, the mean TMB dropped significantly for epirubicin responders (pretreatment: 0.9 mutations per MB, after epirubicin: 0.62 mutations per MB; p = 0.043; Fig. 2a, b), whereas no significant change was observed among non-responders (1.12 and 0.94 mutations per MB, respectively; Fig. 2c, d).
TP53 mutations emerged in four, PIK3CA in one, and GATA3 in one tumor during epirubicin treatment (Additional file 3: Fig. S5). Notably, for all emerging mutations (n = 323 mutations in 19 tumors), we re-examined the raw data to assess whether any mutations were present in small subclones pretreatment. For 67 out of 323 mutations (18 out of 19 tumors), 1-5 sequencing reads were detected indicating that even though escaping formal mutation calling, these variants were present at very low variant allele frequencies (VAFs). Among mutations emerging during epirubicin treatment in the above-mentioned predefined breast cancer genes (mutations seen in TP53, PIK3CA, and GATA3), only one of the GATA3 variants was observed in the pretreatment data.
Furthermore, we detected a numerical decrease of C>T substitutions during epirubicin treatment, but this decrease did not reach statistical significance (mean number of mutations before and after epirubicin, 22.4 versus 16.5; p > 0.05; Additional file 3: Fig. S8c).
Importantly, we observed major changes in CNA in samples collected after epirubicin as compared to before treatment. The most prominent changes were an increase in copy number losses in chromosomes 1q, 2q, 3q (PIK3CA, SOX2) and 6 and 8q (MYC) and an increase in copy number gains in chromosomes 16q and 19 (STK11, AKT2; Fig. 3a). These copy number changes were similar among epirubicin responders and non-responders.
Finally, no single mutation or set of mutations in key breast cancer pathways detected in post-epirubicin biopsies predicted response to subsequent docetaxel treatment (Additional file 4: Table S7, Additional file 3: Fig.  S5). Also, no somatic mutation signatures in post-epirubicin biopsies were associated with response to docetaxel (Additional file 3: Fig. S8b). For HER2-positive tumors, no genomic aberrations characterized tumors responding or not responding to docetaxel + trastuzumab.
TP53 mutations emerged in two, ATM in one, and CDH1 in one tumor during docetaxel treatment (Additional file 3: Fig. S5). In the raw data, 56 out of the 155 variants which emerged during docetaxel treatment had 1-5 sequencing reads in biopsies collected after epirubicin treatment, indicating their presence before commencing docetaxel. Notably, this was not the case for any of the mutations in the key breast cancer genes TP53, ATM, and CDH1 emerging during docetaxel treatment (see above), indicating that these variants may have been acquired during docetaxel.
Also, we observed a general increase in copy number losses in chromosome regions 1q and 8q (MYC) and an increase in copy number gains in 8p (FGFR1, WHSC1L1) after docetaxel as compared to after epirubicin treatment (Fig. 3b). While all of these three types of CNA changes were observed during epirubicin treatment, they were also observed during docetaxel treatment. Assessing the MYC gene specifically, we found a significant increase in copy number losses from the pretreatment setting to the end of neoadjuvant treatment (i.e., after epirubicin and docetaxel treatment; p = 0.0018; Fig. 3c). For HER2-positive tumors, a significant decrease in copy numbers of ERBB2 on chromosome 17 was observed after docetaxel + trastuzumab as compared to post-epirubicin tumors (p = 0.008; Fig. 3d). This contrasted with HER2-negative tumors where no corresponding changes were observed after docetaxel monotherapy. Apart from this, there were no other mutational changes which characterized HER2-positive breast cancers, subsequent to docetaxel and trastuzumab treatment.

Subclonal redistribution during epirubicin treatment
Based on WES data with sufficient tumor cellularity (≥ 20%) in both pre-and post-epirubicin biopsies, subclonal evolution was assessed in 19 out of 51 patients (Additional file 3: Figs. S9-10). Overall, the analysis demonstrated that certain subclones were eradicated by epirubicin in responding tumors, while other subclones remained or expanded. While the majority of truncal subclones remained during epirubicin treatment, the largest shifts in subclone sizes were related to subclones that were relatively small in size, as exemplified in tumors DDP013 and DDP076 (Fig. 4). Interestingly, in 18 out of 19 breast cancers, we detected at least one subclone which disappeared during epirubicin treatment. The only exception (DDP103) was a patient where all preepirubicin mutations were clonal (no subclones present; Fig. 4). Notably, assessing the raw data beyond conventional mutation calling, we found traces (sequencing reads) of the disappearing subclones in post-epirubicin biopsies from 17 out of 18 investigated cases, indicating that these subclones were not entirely eradicated.
Furthermore, we observed novel mutations after epirubicin treatment in all 19 patients with eligible paired tumor samples, indicating the emergence of new subclones during epirubicin treatment. However, while 35 subclones appeared (n = 19 patients), traces of 28 of these were present in pretreatment biopsies when re-examining the raw data. This indicates that the majority of subclones emerging during epirubicin treatment were present at low level pretreatment, rather than being new, acquired subclones.
Importantly, epirubicin substantially changed the composition of subclones in the tumors, even in patients revealing clinical stable disease during treatment. Thus, even in tumors with stable disease, some subclones seemed to be eradicated, while other subclones expanded (Additional file 3: Fig. S9; DDP085 and DDP089).
The number of subclones in pretreatment tumors was not predictive of response to epirubicin. Despite the observation that subclones emerged or disappeared in individual tumors, the numerical differences in subclones in pre-versus post-epirubicin biopsies were small. Typically, the changes in the absolute number of subclones were between zero and two, thus precluding formal statistical assessment.
Furthermore, mutations in established breast cancer genes were examined in subclones that expanded or regressed within each tumor. In line with our previous finding that truncal mutations generally persist through treatment in the metastatic setting [6], mutations in established breast cancer driver genes remained through epirubicin treatment in the current analysis. This was the case for PIK3CA, PTEN, and MAP3K1 (Fig. 5a). Notably, for these genes, relative variant allele frequencies (rVAFs) in individual tumor pretreatment were typically 0.5 or higher, indicating an early truncal occurrence. Although the number of observations was low, rVAFs of GATA3 mutations seemed to increase during epirubicin treatment, indicating growth of GATA3mutated subclones.
Similarly, when mutations were grouped according to signaling pathways/biological processes, the majority of mutations in key breast cancer-related processes persisted through epirubicin treatment (Fig. 5b).

Subclonal redistribution in response to docetaxel treatment
Subclonal evolution was assessed in 17 out of 51 patients with sufficient tumor cellularity (≥20%) in biopsies before and after docetaxel treatment (Additional file 3: Fig. S9-10). Paired biopsies were available in 11 out of 17 patients immediately before and after docetaxel treatment, whereas six patients had insufficient tumor cellularity after epirubicin treatment (before docetaxel). For these six patients, post-docetaxel samples were compared to pre-epirubicin samples. Docetaxel mediated profound subclonal redistribution, which was clearly different from changes induced by epirubicin treatment (exemplified by patient DDP014 ; Fig. 4).
In 16 out of 17 patients, at least one tumor subclone disappeared during docetaxel treatment. This was observed in 10 out of 11 patients with biopsies extracted before and after docetaxel and in all of the six patients where biopsies were taken before epirubicin and after subsequent docetaxel. The only exception was a patient where all pre-docetaxel mutations were clonal (DDP063; Additional file 3: Fig. S9).
Notably, assessing the raw data beyond conventional mutation calling, we found traces (sequencing reads) of the subclones which disappeared in all 16 patients. This indicates that subclones which were no longer present after docetaxel, according to the preset detection limits, were not completely eradicated by the treatment. At the same time, for all patients (n=17), new subclones were observed, after docetaxel treatment, indicating (as for epirubicin) that even in tumors with profound regression, small subclones may emerge during chemotherapy. Notably, among the 24 subclones seemingly appearing under docetaxel treatment, we found raw sequencing reads supporting the presence of 11 of these in the pre-docetaxel sequencing data. This indicates that nearly half of the "new" subclones in fact expanded from smaller subclones, rather than being acquired during docetaxel treatment.
Only two patients with an objective response had adequate quality (high tumor cell fraction) in both pre-and post-docetaxel biopsies, thus precluding comparison of responders versus non-responders with respect to numbers of subclones appearing/disappearing under docetaxel treatment. Notably, similar to the observations under epirubicin treatment, we also found docetaxel to have a profound effect on subclonal composition, even in tumors without an objective clinical response to chemotherapy (Fig. 4; DDP014 and DDP076).
Interestingly, among pretreatment subclones which shrunk below detection limits during epirubicin treatment, none of these re-emerged during docetaxel, indicating that they were probably eradicated by epirubicin.
We further characterized the mutations in subclones that expanded or regressed within each tumor under docetaxel treatment. Similar to what was observed during epirubicin, most truncal mutations affecting key breast cancer genes remained through docetaxel treatment (Fig. 5a). Also, the majority of mutations in key breast cancer-related processes persisted through docetaxel treatment (Fig. 5b). Of notice, the number of samples examined to make these comparisons is low, and the data should be interpreted with caution.
Notably, we sequenced tissue samples after docetaxel treatment for four of the patients that were classified as pCR. In three out of these four, we detected somatic mutations. All four had remaining ductal carcinoma in situ (DCIS) after treatment and most likely the mutations detected after treatment were from the DCIS tissue component.
Finally, we assessed the subclonal evolution during epirubicin or docetaxel monotherapy, or as a results of both sequential treatments, split by HR and HER2 status. This was performed in all 27 cases where biopsies from at least two out of three time points were analyzed. Among these cases, 18 were HR positive and HER2 negative and five were HER2 positive (out of which four were HR positive), while four were TNBC. Although the numbers of HER2positive and TNBC cases were limited and results should be interpreted with caution, we did not observe any specific differences in the subclonal dynamics with respect to breast cancer subtypes (Additional file 3: Fig. S9).

Discussion
In the present study, we aimed at assessing clonal evolution in treatment-naïve breast cancers during neoadjuvant chemotherapy. Thus, we applied WES to map the (See figure on next page.) Fig. 4 Clonal evolution during treatment. Graphical representation of clonal evolution during sequential epirubicin and docetaxel*, for selected patients: a DDP013, b DDP076, c DDP014, and d DDP103. The top panel within each subfigure (a-d) show allelic prevalence of mutation clusters, after clustering using the PyClone algorithm (see Additional file 1), based on variant allele frequencies (VAFs) for all mutations in the samples extracted for each patient. "Clusters" with one mutation have been merged to nearest cluster based on z-score (range of −1, +1) probability. Subsequent clusters with <3 mutations have been removed from the panel, for clarity. The middle panel within each subfigure is a visualization of a likely pattern of tumor evolution using the "timescape algorithm" (see Additional file 1). These models are based on the clusters in the top panels. Diagrams do not distinguish between new subclones appearing, harboring all truncal mutations and those appearing that have lost some truncal mutations. Each color represents an estimated subclone from the mutation clusters. The horizontal axis denotes three different time points during tumor evolution: pretreatment, post-epirubicin, and post-docetaxel. The bottom panel within each subfigure (coxcomb plots) represents somatic aberrations (mutations and copy number alterations; CNAs) for each time point (pretreatment, post-epirubicin, and post-docetaxel). Gray wedges represent merged "passenger mutations" while colored wedges represent driver somatic mutations and driver CNAs. genetic alterations in biopsies collected before, at therapy switch, and after treatment with sequential epirubicin and docetaxel monotherapy.
Comparing pre-and post-treatment cancer genomes is challenging due to a decreasing content of cancer cells during therapy. For this reason, we included patients with large primary breast cancers to compare pretreatment to post-epirubicin and post-docetaxel sequencing results and to obtain longitudinal data across the two main breast cancer chemotherapeutics. With respect to resistance mechanisms, the most interesting pre-and posttreatment genomic comparisons may be in those tumors where a substantial amount of invasive cancer remains after treatment. In our clinical trial cohort, 27 out of 51 post-treatment samples had more than 20% tumor cellularity, allowing the comparison of tumor genomic landscapes before and after therapy. Doing so, we identified several evolutionary patterns of breast cancer during each of the two treatments. Similarly to a previously reported evolutionary pattern from primary to metastatic breast cancer [6], we found that pretreatment truncal clones generally persisted through neoadjuvant chemotherapy, seemingly regardless of which driver mutations that were present. This opens the intriguing question of whether, in a majority of breast cancers, mechanisms involved in tumorigenesis in early treatment-naïve disease are also key factors for subsequent treatment resistance [40].
Despite the persistence of truncal mutations in most patients, we observed a profound subclonal redistribution during neoadjuvant chemotherapy: some subclones shrunk and/or disappeared during epirubicin and subsequent docetaxel treatment, while other tumors seemingly acquired mutations as part of new subclones emerging. Strikingly, the majority of these changes were related to subclones of lower VAFs rather than the larger truncal clones, suggesting a more recent origin during tumorigenesis. Moreover, there were cases where a small, pre-existing subclone grew into a major subclone, without gaining new mutations during the treatment period. Importantly, profound redistribution of the subclonal composition was observed also in tumors with no objective treatment response. Thus, even in breast cancers with stable or progressive disease upon treatment, epirubicin and docetaxel seemed to eradicate a substantial number of subclones, whereas other subclones of resistant cells expanded in the same timespan.
While some of the mutations emerging during treatment were in key driver genes, these may have been present in a low fraction of cells, below detection limits, in the pretreatment setting [41]. Re-analyzing our original raw data, we observed that the majority of subclones that seemingly appeared during epirubicin treatment were in fact present at very low levels pretreatment. Furthermore, almost half of the subclones which emerged during docetaxel treatment pre-existed at low levels prior to commencing docetaxel. Of notice, we did not perform ultra-deep sequencing to fully evaluate the fraction of subclones with pre-existing mutations. Thus, our findings may have underestimated the number of mutations that were actually present in minor subclones pretreatment. At the same time, our findings are largely in line with a previous study where a single-cell approach was used to identify pre-existing mutations emerging during chemotherapy in TNBC [42]. In those cases where emerging mutations could not be detected in pretreatment samples, one may speculate whether or not some of these are in fact induced by the cytotoxic compounds, although it seems unlikely that a large number of mutations would be induced by chemotherapy in such a short-lasting neoadjuvant treatment window.
Apart from a smaller investigation of subclonal evolution in 20 patients with TNBC [42], the mechanisms by which chemotherapy influences the breast cancer genome and subclonal distribution has only been assessed to a limited extent. In a study of 47 patients, genetic diversity was examined in biopsies before and after completed neoadjuvant combination chemotherapy, using in situ hybridization for a small number of genomic loci [43]. The authors established that genetic diversity is an intrinsic trait which remains relatively stable during treatment for each individual tumor. We confirm this finding in our samples, but we reveal a much higher complexity and dynamic changes of the smaller subclones during neoadjuvant treatment. A study of 29 patients selected due to lack of response to neoadjuvant chemotherapy (NAC) [44] found that therapy did not induce any chances to allelic profiles in  the majority of tumors. Our data indicate the opposite, with large CNA changes both during epirubicin and docetaxel treatment.
Recently, Denkert et al. [45] found mutational signatures linked to HRD and APOBEC activity to be predictive of pCR after neoadjuvant chemotherapy in estrogen receptor (ER)-positive, HER2-negative, but not in ERnegative, breast cancers. Furthermore, a signature associated with BRCA-mediated DNA repair has previously been suggested as predictive of pCR among 29 patients receiving neoadjuvant dose-dense anthracycline and taxane chemotherapy [46]. While no mutational signature was predictive of chemotherapy response in the current analysis of pretreatment biopsies, our results should be interpreted with caution due to a low number of patient samples. Regarding mutations emerging or disappearing during treatment, these were too few for formal assessments of mutational signature changes in our material.
Of notice, we observed no profound changes in the mutational processes from the pre-to post-treatment setting. This is in line with previous findings comparing primary to metastatic breast cancer, where the same mutational processes seemed to persist within each patient over time, with the only exception being a slight increase in the contribution from APOBEC-related signatures [6,47]. While our present study focused on subclonal genomic evolution, Echeverria et al. found differences at the transcriptomic and proteomic levels when comparing patient-derived TNBC xenografts collected before and after doxorubicin and cyclophosphamide [48]. Furthermore, in patients with TNBC and non-pCR after neoadjuvant chemotherapy, specific expression profiles have been identified in the remaining (resistant) tumor tissue [49]. Also, changes in hormone receptors and HER2 expression as part of disease progression are important alterations potentially modulating treatment response [50]. However, in the present study, there was insufficient tumor material for repeated analysis of these biomarkers after treatment. In essence, for future trials, multi-omics approaches are clearly needed to elucidate all molecular mechanisms involved in sensitivity or resistance to neoadjuvant treatment.
While our study focused on chemotherapy, previous studies have assessed the selection of subclones under treatment with more targeted drugs. In ER-positive breast cancers undergoing endocrine treatment, emerging ESR1 mutations are well established as a likely resistance mechanism [51]. In a comprehensive analysis, Razavi and colleagues found that mutations in ESR1, MAPK, and ER transcriptional regulators were selected for during endocrine therapy. However, the majority of cases had other, unknown causes of resistance [9], indicating potentially complex mechanisms at play. Apart from parameters selecting patients for primary endocrine therapy, like high ER expression and intrinsic luminal subtypes, HER2 overexpression predicting benefit from adding trastuzumab/pertuzumab and, more recently, a potential benefit from adding platinumcompounds or immune checkpoint inhibitors in TNBC [52][53][54][55], there are currently no validated predictive biomarkers to guide the selection of particular NAC regimens for individual patients. Of notice, PD-L1 expression was not predictive of response to immune checkpoint inhibitors on top of standard NAC regimens [54,55], whereas the predictive value of homologous recombination deficiency (HRD) for PARP inhibitor efficacy still needs further validation in primary TNBC [56]. While inactivating TP53 mutations are predictive of resistance to low-dose anthracycline and mitomycin-based NAC, results with today's anthracycline regimens, at higher doses or combined with cyclophosphamide, are at variance [57][58][59][60]. Thus, clinical trials aiming to identify additional predictive biomarkers are warranted. We have previously demonstrated that inactivation of p53 signaling, caused by TP53 or CHEK2 mutations or low ATM gene expression, is associated with resistance to low or conventional doses of anthracyclines, but not to taxanes [58,[61][62][63]. In the present analysis, TP53 mutations were not predictive of response to epirubicin, but there was a trend towards worse disease-specific survival and earlier relapses among patients with hotspot TP53 mutations, as compared to wildtype TP53 status. Compared to our previous studies examining the predictive impact of TP53 mutations, the current trial used a dose-dense epirubicin regimen, which may have circumvented anthracycline resistance, similar to previous findings with cyclophosphamide dose intensification [60].
Notably, from a clinical perspective, neoadjuvant therapy aims at obtaining pCR at surgery, based on the correlation between pCR and improved survival outcome [64]. Yet, pCR is rarely obtained for the majority of HRpositive, HER2-negative breast cancers, and pCR is even less likely for large T3, as compared to smaller T1-2 primary tumors [64][65][66]. In the current trial, we included patients with large primary tumors and obtained pCR rates of 3% for HR+/HER2−, 33% for HER2+ breast cancers, and 30% for TNBC. While pCR rates are generally higher in recent, international trials with combination chemotherapy regimens for TNBC and HER2+ breast cancers, these trials have a large percentage of T1-2 tumors [64,66] where pCR rates are expectedly higher. In a previous neoadjuvant trial, with similar tumor characteristics to ours, pCR rates were similar to what we obtained [67]. Still, recent improvements in neoadjuvant regimens, such as the implementation of endocrine therapy in strongly ER-positive, luminal breast cancers, pertuzumab for HER2-positive tumors, and platinum and immune checkpoint inhibitors for TNBC, would be considered more preferable if a similar trial was designed today.

Conclusions
We found both epirubicin and docetaxel monotherapy to cause profound redistribution of smaller subclones in primary breast cancer, while early truncal mutations and the main subclones generally persisted through treatment.
Additional file 1. The Dose Dense Trial (Methods and Results) and Sequencing approaches.

Additional file 2. Study Protocol.
Additional file 3: Figure S1. CONSORT diagram for the Dose-Dense trial. Figure S2. Survival in the Dose-Dense trial after a minimum followup of five years or until death. Figure S3. Average sequencing coverage for each sample subject to whole exome sequencing (WES). Figure S4. Average sequencing coverage for each pretreatment tumor sample subject to amplicon-based sequencing. Figure S5. Oncoplot depicting the frequency of mutations in tumor samples undergoing whole exome sequencing (WES). Figure S6. Merged overview of copy number alterations (CNAs) in the sample set. Figure S7. Scatter plot showing Pearson correlation between copy number alterations (CNAs) and tumor mutation burden. Figure S8. Estimated contribution of mutational processes for each of the patients according to the classification by COSMIC. Figure  S9. Graphical representation of clonal evolution from the pretreatment setting to post-epirubicin and post-docetaxel. Figure S10. Circos plots, coxcomb plots and heatmaps illustrating copy number logRs and mutation rVAFs pretreatment, post-epirubicin and post-docetaxel. Table S1. Baseline characteristics. Table S2a. 6 most common serious adverse event (grade 2 or more). Table S2b. 8 most common adverse event (any grade). Table S3. Response to neoadjuvant epirubicin and docetaxel in stage II-III breast cancers, and outcome at surgery. Table S4. Treatment response. Table S5a. Pretreatment mutations and prediction to epirubicin treatment. Table S5b. Pretreatment mutations and prediction to docetaxel treatment. Table S6. Mutations in pretreatment biopsies and prediction to epirubicin treatment. Table S7. Mutations in post-epirubicin biopsies and prediction to docetaxel treatment. Table S4. Dose-Dense trial; tumors selected for DNA sequencing, response to epirubicin (EPI) and docetaxel (TAX), surgical outcome and breast cancer characteristics.