Whole-genome epidemiology, characterisation, and phylogenetic reconstruction of Staphylococcus aureus strains in a paediatric hospital

Background Staphylococcus aureus is an opportunistic pathogen and a leading cause of nosocomial infections. It can acquire resistance to all the antibiotics that entered the clinics to date, and the World Health Organization defined it as a high-priority pathogen for research and development of new antibiotics. A deeper understanding of the genetic variability of S. aureus in clinical settings would lead to a better comprehension of its pathogenic potential and improved strategies to contrast its virulence and resistance. However, the number of comprehensive studies addressing clinical cohorts of S. aureus infections by simultaneously looking at the epidemiology, phylogenetic reconstruction, genomic characterisation, and transmission pathways of infective clones is currently low, thus limiting global surveillance and epidemiological monitoring. Methods We applied whole-genome shotgun sequencing (WGS) to 184 S. aureus isolates from 135 patients treated in different operative units of an Italian paediatric hospital over a timespan of 3 years, including both methicillin-resistant S. aureus (MRSA) and methicillin-sensitive S. aureus (MSSA) from different infection types. We typed known and unknown clones from their genomes by multilocus sequence typing (MLST), Staphylococcal Cassette Chromosome mec (SCCmec), Staphylococcal protein A gene (spa), and Panton-Valentine Leukocidin (PVL), and we inferred their whole-genome phylogeny. We explored the prevalence of virulence and antibiotic resistance genes in our cohort, and the conservation of genes encoding vaccine candidates. We also performed a timed phylogenetic investigation for a potential outbreak of a newly emerging nosocomial clone. Results The phylogeny of the 135 single-patient S. aureus isolates showed a high level of diversity, including 80 different lineages, and co-presence of local, global, livestock-associated, and hypervirulent clones. Five of these clones do not have representative genomes in public databases. Variability in the epidemiology is mirrored by variability in the SCCmec cassettes, with some novel variants of the type IV cassette carrying extra antibiotic resistances. Virulence and resistance genes were unevenly distributed across different clones and infection types, with highly resistant and lowly virulent clones showing strong association with chronic diseases, and highly virulent strains only reported in acute infections. Antigens included in vaccine formulations undergoing clinical trials were conserved at different levels in our cohort, with only a few highly prevalent genes fully conserved, potentially explaining the difficulty of developing a vaccine against S. aureus. We also found a recently diverged ST1-SCCmecIV-t127 PVL− clone suspected to be hospital-specific, but time-resolved integrative phylogenetic analysis refuted this hypothesis and suggested that this quickly emerging lineage was acquired independently by patients. Conclusions Whole genome sequencing allowed us to study the epidemiology and genomic repertoire of S. aureus in a clinical setting and provided evidence of its often underestimated complexity. Some virulence factors and clones are specific of disease types, but the variability and dispensability of many antigens considered for vaccine development together with the quickly changing epidemiology of S. aureus makes it very challenging to develop full-coverage therapies and vaccines. Expanding WGS-based surveillance of S. aureus to many more hospitals would allow the identification of specific strains representing the main burden of infection and therefore reassessing the efforts for the discovery of new treatments and clinical practices. Electronic supplementary material The online version of this article (10.1186/s13073-018-0593-7) contains supplementary material, which is available to authorized users.


(Continued from previous page)
Results: The phylogeny of the 135 single-patient S. aureus isolates showed a high level of diversity, including 80 different lineages, and co-presence of local, global, livestock-associated, and hypervirulent clones. Five of these clones do not have representative genomes in public databases. Variability in the epidemiology is mirrored by variability in the SCCmec cassettes, with some novel variants of the type IV cassette carrying extra antibiotic resistances. Virulence and resistance genes were unevenly distributed across different clones and infection types, with highly resistant and lowly virulent clones showing strong association with chronic diseases, and highly virulent strains only reported in acute infections. Antigens included in vaccine formulations undergoing clinical trials were conserved at different levels in our cohort, with only a few highly prevalent genes fully conserved, potentially explaining the difficulty of developing a vaccine against S. aureus. We also found a recently diverged ST1-SCCmecIV-t127 PVL− clone suspected to be hospital-specific, but time-resolved integrative phylogenetic analysis refuted this hypothesis and suggested that this quickly emerging lineage was acquired independently by patients. Conclusions: Whole genome sequencing allowed us to study the epidemiology and genomic repertoire of S. aureus in a clinical setting and provided evidence of its often underestimated complexity. Some virulence factors and clones are specific of disease types, but the variability and dispensability of many antigens considered for vaccine development together with the quickly changing epidemiology of S. aureus makes it very challenging to develop full-coverage therapies and vaccines. Expanding WGS-based surveillance of S. aureus to many more hospitals would allow the identification of specific strains representing the main burden of infection and therefore reassessing the efforts for the discovery of new treatments and clinical practices.

Keywords: Staphylococcus aureus, Microbial genomics, Microbial epidemiology, Bacterial pathogens
Background Staphylococcus aureus is a bacterium commonly found on the skin (15%), in the nostrils (27%), and in the pharynx (10-20%) of healthy adults [1][2][3], but it is also the cause of a number of diseases, whose severity ranges from common community-associated skin infections to fatal bacteraemia [3][4][5]. S. aureus is a leading cause of surgical, device-related, and pleuropulmonary infections, which can result into life-threatening infective endocarditis or even sepsis [6]. The mortality of S. aureus invasive infections was extremely high (> 80%) in the pre-antibiotic era [6,7], and only the introduction of penicillin at the beginning of the 1940s was able to contain it. However, resistant strains carrying a penicillinase/beta-lactamase quickly emerged [8][9][10], and more than 90% of current human-associated isolates are resistant to penicillin [6]. Similarly, the introduction of the penicillinase-resistant antibiotic methicillin was quickly followed by the emergence of methicillin-resistant S. aureus (MRSA) clones [11][12][13]. S. aureus is capable of acquiring resistance to virtually every antibiotic that has entered clinical use [14,15], including recently developed agents like daptomycin and linezolid [16,17] and the last resort antibiotic vancomycin [18,19]. In 2017, the World Health Organization has listed vancomycin-intermediate and vancomycin-resistant MRSA among the high priority pathogens for research and development of new antibiotics [20].
S. aureus's ability to spread worldwide and to cause outbreaks in both hospitals and the community [21,22] has fostered the study of its global epidemiology [3,15,[23][24][25]. Some lineages are very prevalent worldwide (e.g. CC5 and CC8) [24], whereas others have a more localised spreading range, like the CC5 ST612 clone, which has been found only in South Africa and Australia [24,26]. MRSA prevalence is also highly geographically variable, ranging from < 1% in some Northern European countries to > 50% in some American and Asian countries, with livestock-associated MRSA disseminating in the last two decades [24]. Newly emerging highly pathogenic and pandemic clones have also been globally characterised [27,28] and are often the results of recombination events as in the case of the ST239-SCCmecIII clone [25,27,29]. S. aureus investigations have however often underestimated the importance of non-MRSA clones, usually considering only hypervirulent or specifically relevant methicillin-sensitive S. aureus (MSSA) lineages [15], even though MSSA is the most common cause of surgical site infection [30,31] and one of the major nosocomial pathogens [15].
Untargeted profiling of the entire S. aureus population in a given site or area is as important as its global epidemiology, and it is crucial for surveillance and prevention of local outbreaks. Some studies have for instance unbiasedly assessed the local epidemiology of nosocomial S. aureus, suggesting that this pathogen is only rarely transmitted from nurses to hospitalised patients in presence of adequate infection prevention measures [32] and that the community acts as major source of nosocomial MRSA [33]. Studies surveying the whole S. aureus population in hospitals have however focused on single aspects, like the diversity of the population, its virulence and resistance traits, and its transmission in presence of an outbreak [34][35][36][37][38][39] or in non-emergency conditions [40][41][42]. Despite the large body of researches on S. aureus, studies addressing a whole S. aureus infective population at a given site through whole genome sequencing to simultaneously look at the epidemiology, phylogenetic reconstruction, genomic characterisation, and transmission pathways of infective clones are currently limited [43]. Expanding these types of studies will be crucial for an in-depth global monitoring of S. aureus.
Here we report an in-depth epidemiological and genomic investigation of S. aureus infections in a paediatric hospital in Italy. With a whole-genome sequencing approach, we reconstructed the phylogenies of the clones in the cohort, characterised known clones and variants, screened for resistance and virulence genes, and tested for the presence of an outbreak. This allowed us to appreciate the high diversity of the S. aureus community, with 80 different lineages, variability of the resistance cassettes, and uneven conservation of various antigens previously clinically tested for vaccine development. We further report an increased prevalence of highly resistant and lowly virulent clones in chronic infections, and the rise of a newly emerging clone already reported in other hospitals. Overall, our results highlight the complexity of S. aureus epidemiology and advocate the need for wider genome-based analysis.

Materials and methods
Sample collection and S. aureus isolation Samples were collected at Anna Meyer Children's University Hospital (Florence, Italy) from 160 patients from January 2013 to December 2015. Metadata were also collected (Additional file 1: Table S1). We analysed samples obtained from the most common sites of infection for S. aureus, namely airways (bronchial aspirates, sputum or oropharyngeal and nasal swabs) or from soft-tissue and skin lesions. All samples were processed for the detection of bacteria using selective (Mannitol Salt Agar 2, bioMérieux) and chromogenic culture media for MRSA (BBL™ CHROMagar™ MRSA II, Becton Dickinson). In order to confirm species-level identification, mass spectrometry analysis was performed using matrix-assisted laser desorption/ionisation time of flight (MALDI-TOF) (VITEK® MS, bioMérieux). Antibiotic susceptibility was evaluated using the automated system VITEK®2 (bioMerieux) with the card AST-P632 (see Additional file 1: Table S1 for antibiograms). All identified strains were stored at − 80°C for the following molecular analysis. In order to determine the potential virulence of SA/ MRSA strains, a specific PCR assay for the presence of the gene (lukS-lukF) encoding for the Panton-Valentine Leukocidin (PVL) was set up following a previously published protocol [44]. The mecA gene and other loci of the SCCmec cassette were analysed using different multiplex PCR. The protocol suggested by Milheirico et al. [45] has been used as a screening test for most frequent SCCmec cassettes types (types I, II, III, IV, V, and VI) and then confirmed with other methods in equivocal cases [45][46][47][48].
PCR-based multilocus sequence typing (MLST) was carried out with 25 μl reaction volumes containing 2 μl of chromosomal DNA, 20 μM of each primer, 1 U of Taq DNA polymerase (Super AB Taq, AB analitica), 2.5 μl of 10× PCR buffer (supplied with the Taq polymerase), 1.5 μM MgCl 2 , and 250 μM each deoxynucleoside triphosphates. The PCR was performed with an initial 5-min denaturation at 95°C, followed by 30 cycles of annealing at 55°C for 1 min, extension at 72°C for 1 min, and denaturation at 95°C for 1 min, followed by a final extension step of 72°C for 5 min. The amplified products were purified and then amplified with the BigDye® Terminator v3.1 Cycle Sequencing Kits (Applied Biosystem) with the primers used in the initial PCR amplification. The sequences of both strands were determined with an ABI Prism 310 DNA sequencer. Isolates with the same ST have identical sequences at all seven MLST loci.
Isolates sequencing and data pre-processing DNA libraries were prepared with Nextera XT DNA Library Preparation Kit (Illumina, California, USA). Quality control was performed with Caliper LabChip GX (Perkin Elmer) prior to shotgun sequencing with MiSeq (Illumina, California, USA), with an expected sequencing depth of 260 Mb/library (expected coverage > 80×). One hundred twenty-nine million reads were generated (704 thousand reads/sample s.d. 349 thousand).
Sequences were pre-processed by removing low-quality (mean quality lower than 25) or low-complexity reads, reads mapping to human genome or to large and small ribosomal units of bacteria, fungi and human, and known contaminants (e.g. phiX174, Illumina spike-in). All genomes are available at the NCBI Sequence Read Archive (BioProject accession number PRJNA400143).

Genome assembly and annotation
Pre-processed reads were de novo assembled using SPAdes version 3.6.1 [49] and discarding contigs shorter than 1000 nt. We selected for our analysis only reconstructed genomes with an N50 > 50,000. We obtained high-quality genomes (N50 > 50,000 and less than 150 contigs) for 135 of the 160 patients enrolled. Genomes belonging to the remaining 25 patients were excluded from further analyses. Genomes were annotated with Prokka version 1.11 [50] using default parameters and adding --addgenes and --usegenus options.

Genome alignment/phylogenetic analysis
The sets of 1464 concatenated genes used as input for constructing whole cohort (Fig. 1) and strain (Fig. 2) phylogenetic trees were generated using Roary version 3.4.2 [51]. Maximum likelihood trees were inferred with RAxML version 8.0.26 [52] using a GTR replacement model with four discrete categories of Gamma. Support at nodes was estimated using 100 bootstrap pseudo-replicates (option "-f a"). The phylogenetic tree in Additional file 2: Figure S1 was inferred using the presence-absence binary matrix of the core and accessory genes computed with Roary version 3.4.2 [51] in RAxML version 8.0.26 [52] with option "-m BINGAMMA". Phylogenetic analyses were conducted using only one single isolate per patient; when multiple isolates from different timepoints of the same patient were available, the reconstructed genome with the highest N50 and the lowest number of contigs was selected. In most cases (n = 30), patients maintained the same ST over time; in discrepant cases (n = 2), we selected the most prevalent clone.
In silico sequence type (ST), SCCmec, and spa-type identification In order to assign SCCmec type also to equivocal cases and to confirm PCR-based SCCmec typing, the same set of primers [45] and other primer sets [53,54] were mapped to reconstructed genomes by BLAST [55]. In most cases, the two methods were consistent. In discordant cases, PCR was repeated. Sequence typing and spa-typing were conducted using MetaMLST [56] and the DNAGear software [57] respectively. Many isolates were not assigned a spa-type because of the limitations of short-read shotgun sequencing in repeated regions, which cause problems in genome assembly.

Virulence factors and resistance gene analysis
Selected virulence factors and resistance genes (as in [58]) were searched for by mapping reference genes (Additional file 3: Table S2) to all reconstructed genomes with BLAST [55] with the following parameters [−evalue 1e−10 -perc_identity 90 -gapopen 5 -gapextend 5] with a match > 75%. Virulence genes to be searched for were selected on the basis of a careful literature review for their clinical relevance .

Analysis of genes with available vaccine targets
Genes of interest were identified as those S. aureus vaccine candidates that had already entered clinical trials (according to http://clinicaltrials.gov as of January 2018), and those candidates that showed promising results in preclinical trials. For each genome, we extracted the reference sequences using BLAST [55] with default parameters. Extracted genes were pairwise globally aligned with the reference and evaluated for synonymous and non-synonymous single-nucleotide variation (SNVs), insertions, and/or deletions.

Bayesian divergence estimates
We estimated divergence times of ST1 SCCmecIV t127 PVL− clones using BEAST2 [85] and the core genome (core genes = 1464). We defined the best fitting model priors by testing the combination of three clock models (uncorrelated relaxed exponential, uncorrelated relaxed lognormal, and strict), three demographic models (birth-death, coalescent Bayesian skyline, and constant), and two substitution models (HKY -Hasegawa, Kishino, Yano and generalised time reversible). Bayesian Markov chain Monte Carlo were run for 500 Mio. generations and sampled every 1000 generations. We chose the combination of models that resulted in the highest Bayes Whole-genome maximum likelihood phylogenetic trees of the four most relevant STs. All available reference genomes for ST22, ST121, and ST228 have been included. For ST5, 1478 reference genomes were available, but only 24 were included for the sake of clarity. The phylogenetic tree of ST1 and available reference genomes was also produced, but it is not reported here to avoid overlapping with Figure 5 factor after parameter correction using AICM in Tracer (see Additional file 4: Table S3).

Statistical tests
Associations between STs/virulence genes/antibiotic resistance markers and sample/operative unit types were found by performing Fisher's exact test between the class of interest and the remaining set of samples.

Results and discussion
We investigated the epidemiology and the whole-genome genetics of S. aureus isolated from multiple operative units of the same paediatric hospital in Italy (Meyer's Children Hospital, Florence). Two hundred thirty-four S. aureus isolates from 160 patients were retrieved from diverse clinical specimens, tested for antibiotic susceptibility, and subjected to whole-genome sequencing (see Materials and methods). The study produced 184 high-quality reconstructed S. aureus genomes with a N50 larger than 50,000 and less than 250 contigs (Additional file 1: Table S1). Downstream analyses are focused on the 135 high-quality strains recovered from distinct patients.

Genome sequencing highlights the presence of common clonal complexes and five newly sequenced clones
We first performed a whole-genome phylogenetic analysis to investigate the population structure of S. aureus in our cohort. The phylogeny was built using one isolate for each patient (n = 135) and using the 1464 core genes representing a core genome of > 1.19 M bases (see Materials and methods and Fig. 1). The genomic diversity of S. aureus is highlighted by the relatively large number of accessory genes even in a limited cohort of clinical isolates (n = 6909 from a pangenome of 8373 (Additional file 2: Figure S2), in concordance with a recent study based on the pangenome of 64 strains from different ecological niches [86]. The gene presence/absence phylogenetic model considering both core and genes confirmed the structure of the one built on the core genome alone, with however a slightly higher strain-diversity for isolates belonging to the same ST (Additional file 2: Figure S1). Despite this diversity, we found the presence of a reduced set of closely related strains in the cohort (Fig. 1) mostly associated with distinct multilocus sequence typing clones (STs) [87] (see Materials and methods). We identified a total of 29 different STs, with five of them-ST228, ST22, ST5, ST121, and ST1-found in at least 12 patients (Table 1 and Additional file 1: Table S1) with evidence of ST replacement in only one patient (Patient 091 switching from ST228 to ST22) of the 32 patients sampled at multiple timepoints. This longitudinal strain consistency was confirmed by whole-genome analysis (mean intra-patient variability = 56.42 SNVs), for which the replacing event in Patient 091 accounted for 6238 SNVs between the 2013 and 2016 isolates, 0.22% of the genome. The 29 identified STs belong to 14 clonal complexes (CCs), with the five most prevalent CCs (CC5, CC22, CC8, CC1, and CC121) comprising more than 60% of the isolates. Spa-typing [57] further refined the typing resolution: we found 44 distinct spa-types (Additional file 1: Table S1), with t001, t002, t008, and t127 being the most prevalent (i.e. present in > 4 isolates, Table 1). We also investigated the presence of the Panton-Valentine Leukocidin (PVL), a two-component prophage virulence factor allowing S. aureus to escape from the host immune system, that was found in 27.4% of the samples (Additional file 1: Table S1).
According to both antibiotic susceptibility testing (oxacillin and cefoxitin susceptibility, Additional file 1: Table S1) and genome analysis (presence of the SCCmec cassette, see Materials and methods), 63.7% of the isolates were classified as methicillin-resistant S. aureus (MRSA). Most strains (n = 54) belonged to SCCmecIV; type I cassettes were also abundant (n = 19), whereas cassettes type V (n = 8) and II (n = 1) were less represented. Methicillin resistance was unevenly distributed across the phylogenetic tree ( Fig. 1) and partially independent from the STs. All CC1 isolates (n = 14, ST1 and ST772) were MRSA, and so were the isolates belonging to CC5 (n = 30, ST5 and ST228) and CC22 (n = 16, ST22 and ST1327). All CC121 (n = 12, ST121) and CC10 (n = 3, ST10 and ST1162) isolates were instead methicillin-sensitive (MSSA), and other clonal complexes (CC8, CC30, CC45) showed balanced proportions of sensitive and resistant strains. SCCmecI (n = 19) was the most CC-specific cassette, as it was found almost exclusively in CC5 isolates (ST5 and ST228), with the exception of one ST15 and one ST8 isolates, while neither SCCmecIV nor SCCmecV were associated with specific STs.
For five of the recovered STs, namely ST241, ST942, ST1162, ST1327, and ST1866, no sequenced genome is publicly available (as genomes of S. aureus in RefSeq [88] version 2017 [89]). Although a large number of S. aureus genome sequences are available in NCBI, these are biased toward a limited set of clinically relevant STs [43,90], with many others being neglected. This underrepresentation of less-pathogenic or less-known strains may lead to a poor understanding of the host-pathogen interactions at the genomic level, and to an underestimation of emerging or re-emerging pathogenic strains [25,43].
We moreover identified two isolates (1.48%) belonging to the livestock-associated MRSA (LA-MRSA) ST398 clone [24,94] (Table 1). This clone has already been reported in patients that had regular exposure to livestock in several countries [24,95,96] including Italy [97][98][99], but our results and other reports [96,[100][101][102] of infections in non-exposed subjects suggest that the within-subject transmission for these clones is not rare. Similar conclusions can be drawn for another LA-MRSA, namely ST97 (n = 2, 1.48%, Table 1), which is the leading cause of bovine mastitis, but is only rarely reported in humans [103][104][105][106]. This growing incidence of LA-MRSA strains (n = 4, 2.96% in  The combination of the four methods (MLST, SCCmec-, and spa-typing, and PVL presence) yielded 80 different lineages. Three isolates were not assigned to any specific ST and are reported in the last row of the table our cohort) causing zoonotic infections highlights the existence of underestimated reservoirs of S. aureus strains that could become epidemic [28,107,108]. One isolate was assigned to ST395, which is an unusual strain unable to exchange DNA via bacteriophages with other S. aureus strains because of a modification in the wall teichoic acid (WTA) [109,110]. The same modification, however, enables ST395 to exchange DNA with coagulase-negative staphylococci (CoNS) [110], making it particularly prone to exchange SCCmec elements and others with other commonly found staphylococci, e.g. S. epidermidis.

Genomic signatures of chronic versus acute S. aureus infections
In order to investigate the potential association of clones and antibiotic resistance with specific hospital operative units, we cross-checked the prevalence of SCCmec types, STs, and PVL+ clones with both OUs and sample types (see Materials and methods). Strains from the cystic fibrosis (CF, n = 76) unit were positively associated with the presence of SCCmecI (n = 19, ten from CF unit; p value = 0.03), a cassette known to be hospital-associated [111,112]. Strains from the same unit were also associated with ST1 (n = 12, seven from CF unit; p value = 0.04), whereas we noted a reduced prevalence of the PVL genes (n = 37, only two from CF unit; p value = 0.0002) and of ST121 (n = 12, none from CF unit; p value = 0.02). This reflects the relatively attenuated virulence which is a well-known phenomenon in long-term S. aureus infections [113][114][115][116]. Similarly, sputum samples (n = 33; 88.7% from CF unit) were associated with ST228 (n = 16, nine from sputum; p value = 0.004) and SCCmecI (n = 19, 11 from sputum; p value = 0.0008), and negatively correlated with PVL (n = 37, only two from sputum; p value = 0.001). The high correlation of ST228 with lung isolates and specifically with CF has already been observed in Spain [111]. A similar pattern of increased resistance and lowered virulence has been observed for another sample type linked with long-term lung infections, namely broncho-aspiration material (n = 23; 78.2% from intensive care unit). Strains from this sample type were associated to SCCmecIV (n = 54, 14 from broncho-aspiration material; p value = 0.008), and with PVL− (n = 98, 23 from broncho-aspiration material; p value = 0.0005) and MRSA clones (n = 83, 21 from broncho-aspiration material; p value = 0.002), highlighting once again the loss of virulence and the acquisition of resistance in long-term lung infections [113][114][115][116].
On the contrary, patients from both emergency room (n = 5) and the infectious disease unit (n = 15) show an overrepresentation of PVL+ clones (n = 37, four from emergency room and nine from infectious diseases; p values = 0.02 and 0.005, respectively), indicative of acute rather than chronic infections. Lesion swabs (n = 31) are strongly associated with MSSA (n = 49, 31 from lesion swabs; p value = 3e−08). This sample type was also associated to the hypervirulent ST121 clone [117,118] (n = 12, 11 from lesion swabs; p value = 2e−05) and to the presence of the PVL (n = 37, 14 from lesion swabs; p value = 3e−07), suggesting that in our cohort skin and soft tissue infections (SSTIs) are predominantly caused by hypervirulent MSSA strains. Lesion swabs from children in care at the infectious diseases unit (n = 12, 80% of the samples from this operative unit) are also characterised by high prevalence of the virulent ST45 clone [119,120] (n = 8, three from lesion swabs; p value = 0.04) that is known to be associated with SSTIs [121][122][123][124]. The expected [125] association between PVL (n = 37) and ST121 (n = 12, nine PVL+; p value = 0.001) and ST30 (n = 7, five PVL+; p value = 0.003) supports once again the observed increased virulence of these STs [117,118,126,127], which is partially in conflict with the hypothesis of lesion colonisation by commensal strains present in the skin microbiome [128,129].

Discovery of novel variants of SCCmecIV with kanamycin, trimethoprim, and bleomycin resistance
We next investigated the specific genetic variants of the four types of SCCmec cassettes identified and discussed above. This is relevant because the epidemiology of this genetic element is disentangled by that of the rest of the genome by virtue of its high horizontal mobility [130,131]. Moreover, the SCCmec can host genes encoding not only for resistance to beta-lactams [132,133], but also for other antibiotic resistances or virulence factors [131].
More than a half of the MRSA isolates in our collection (n = 86) carried SCCmecIV (62.8%). This cassette type has spread widely in the last decades, often substituting the previously more prevalent nosocomial SCCmec types I and II [24,134], and it is now common especially in European clinical isolates [24,92]. Another cassette that has spread in recent years following a similar path is SCCmecV [134,135], the third most prevalent cassette type in our cohort (10.5% of the MRSA isolates) after the more traditionally hospital-associated SCCmecI [24,112] (22.1% of the MRSA isolates). We moreover isolated one MRSA carrying SCCmecII, which is widely diffused in the USA but only rarely found in Italy/Europe [25,136]. Consistently, the SCCmecII isolate was recovered from Patient 115, which is consistent with the personal history of the patient. For two isolates, it was not possible to classify the cassette neither with PCR nor with in silico PCR using standard primers [45].
By aligning reconstructed SCCmec with reference cassettes (see Materials and methods), we observed a certain degree of variability inside the same cassette type, specifically in type IV (Fig. 3). Subtypes IVa, IVb, and IVc were identified, with some SCCmec elements showing insertions. Two cassettes in particular were not consistent with the already described subtypes: the SCCmec type IVc carried by MF062, which was enriched with genes for kanamycin [137] and bleomycin [138,139] resistance, and the type IVa carried by MR090 that showed insertion of genes involved in resistance to trimethoprim [140,141] (Fig. 3).

Non-SCCmec resistance profiles show different patterns in chronic and acute infections
S. aureus can easily acquire a number of resistances, including those to the last resort antibiotics vancomycin [142,143] and daptomycin [144]. According to results presented in previous paragraphs and elsewhere [145], resistances can occur by gene acquisition in the SCCmec cassette. Most resistances are however encoded by genes that are found in other parts of the genome or that have been horizontally transferred through different genetic elements [25]. Given the high importance of multi-drug resistance in S. aureus [20], we therefore tested the presence or absence of specific resistance genes in our cohort [146] (Fig. 4 and Additional file 3: Table S2). Consistently with previous literature [6], most of the isolates tested positive for blaZ (81.5%), responsible for penicillin resistance (96.3% concordance with antibiotic susceptibility test, as per presence of the pbp and/or mecA genes). No isolates were found positive for genes encoding resistance to vancomycin (van, 100% concordance with antibiotic susceptibility test) and to fusidic acid (fusB and far, 94.1% concordance with antibiotic susceptibility test). Antibiotic resistances were sometimes associated with specific CCs, as for the increased representation of aacA.aphD (gentamicin resistance, 92.6% concordance with antibiotic susceptibility test) and ermA (erythromycin resistance, phenotypic resistance not tested) in CC5 isolates, whose genomes tended to lack instead the blaZ gene (penicillin resistance) (Fig. 4). Overall, two isolates from acute skin infections were negative for all the resistance genes tested, while six CF and intensive care unit isolates were positive for six (33.3%) of them. This pattern of increased resistance in long-term infections, together with their observed reduced virulence, completes the scenario of reduced virulence and increased resistance that has been observed in this and previous studies [113][114][115][116].
Emergence and disease-associated diversity of clinically relevant virulence factors S. aureus has a large repertoire of virulence genes, and it is able to evade the host immune system through a Fig. 3 Overview of the SCCmecIV cassette variability in our cohort, compared with available reference cassettes for the recovered subtypes IVa, IVb, and IVc. Genes are marked as arrows in the direction of transcription. To avoid biases due to misassemble of the region of interest, only cassettes found on a single contig are reported. Annotated SCCmec are grouped together with the closest reference cassette subtype. Some genomes showed insertions of genes involved in resistance to trimethoprim (MR090) and to kanamycin and bleomycin (MF062) variety of strategies. Some of the genes usually involved in immune evasion were present in almost all our isolates ( Fig. 4 and Additional file 3: Table S2). These include genes encoding the phenol-soluble modulin alpha and beta and the delta-haemolysin Hld, responsible for leukocytes and erythrocyte lysis respectively [60]; the immunoglobulin-binding protein Sbi that inhibits IgG and IgA [61,62]; and some genes part of the GIɑ genomic island (ssl6 and ssl9).
Other genes belonging to the immune evasion island IEC2 were present in many but not all isolates, for example, the one encoding for the antiplatelet extracellular fibrinogen binding protein Efb [63,64] and those encoding various haemolysins (hla, hlg) [59,60] (Fig. 4 and Additional file 3: Table S2). In addition to the 27.4% prevalence of the lukF and lukS PVL genes discussed above, one sample (MR029, from emergency room) was positive for the epidermal cell differentiation inhibitor Edin, which has been found to promote the translocation of S. aureus into the bloodstream [65]. One of the two USA300 isolates (MR047, from nasal swab) tested positive for the arginine catabolic mobile element (ACME), another important virulence factor (gene arcA) that has been shown to be responsible for the increased pathogenicity of S. aureus and specifically of USA300 clones [66,67].
The hypervirulent ST121 MSSA isolates obtained from lesion swabs (n = 12) were instead associated with the genes encoding for the exfoliative toxins Eta and Etb (n = 3 from ST121 swabs, and n = 0 for non-ST121, p value = 0.0006 for both genes), responsible for the skin manifestations of bullous impetigo and Staphylococcal scalded skin syndrome [77][78][79], the gene bbp (n = 12 from ST121, n = 7 from non-ST121; p value = 1.35e−08) that interacts with the extracellular matrix bone sialoprotein and contributes to staphylococcal arthritis and osteomyelitis [80], and the immune evasion gene ecb (n = 12 from ST121, n = 36 from non-ST121; p value = 1.51e06), which is required for the persistence of S. aureus in host tissues and the formation of abscesses [81]. The latter was also present in all and only the isolates belonging to ST1, ST7, ST10, ST15, ST30, ST34, and ST398, suggesting a strong dependence on ST (Fig. 4 and Additional file 3: Table S2).
Isolates retrieved from sputum samples of CF patients (n = 38) showed a positive association with the adhesin-encoding genes sdrD (n = 34 from CF, n = 69 from non-CF; p value = 0.03) and sdrE (n = 27 from CF, n = 48 from non-CF; p value = 0.03), and a negative association with bbp (n = 1 from CF, n = 18 from non-CF; p value = 0.01), contrary to samples from the infectious disease unit (n = 15, four positive for bbp gene). This finding is consistent with the increased need for adhesins in chronic lung infections [82,83,116], including in CF [84].

Conservation of genes encoding vaccine candidates
Unlike other bacterial infections, prior exposure to S. aureus does not seem to provide protective immunity [147]; therefore, vaccines are an attractive yet challenging option to prevent disease. Researchers have long attempted to produce an effective vaccine against S. aureus, but even though few have proved promising in animal models, the two vaccines so far tested in efficacy clinical trials have failed [147][148][149][150][151][152][153]. Since the main issue is the polymorphic expression of S. aureus surface antigens and the redundancy of its virulence proteins [147,154,155], we tested the prevalence and conservation of a number of genes encoding vaccine candidates described in the literature ( Table 2).
Among antigens that have been proposed as targets for vaccine development, the alpha haemolysin toxin gene hla [147,156,157] and the genes coding for capsular biosynthesis cap5 and cap8 [150,151] are highly prevalent in our cohort (91.9% and 97.8% of the isolates respectively). Nevertheless, these genes showed a larger degree of variability compared to the others we considered, which may explain the poor results obtained in clinical trials [147,150,151,156,157]. Other genes that code for proteins used alone or in combination in vaccine formulations, such as the virulence determinant SpA [158] and the fibronectin-binding protein ClfA [159][160][161], are present in most of our strain collection. In some of these genes, indels are prevalent (> 90%, Table 2), but they are frequently found in repeated regions that may not critically impact the protein structure, as in the case of the spa gene.
Vaccines have also been proposed for S. aureus strains with specific characteristics. For instance, targeting the toxicity determinant TSST-1 (5.9% prevalence of tst1) [162,163] or the PVL proteins LukF-LukS (27.4% prevalence of lukF-lukS) [164,165] aims at selectively preventing the most virulent or lethal infections. In our cohort, despite their low prevalence, both tst1 and the PVL genes were conserved at 99%, except for a few isolates that had indels in the latter ( Table 2). The gamma-haemolysins HlgAB and HlgCB genes [164,165] were instead highly prevalent (97.8-100%) and quite conserved (69.6-94.8%). The opposite approach is targeting genes with a lower virulence profile, which may be more prevalent and conserved than those coding for highly toxic factors. Among them, the genes encoding for the manganese uptake receptor (mntC) [159][160][161] and for the iron acquisition factor (isdB) [152,166], which are indeed present in all or all but one the isolates of our cohort. Non-synonymous mutations are rare in mntC (20.7% of the isolates, with only one non-synonymous SNV), and, whenever not affected by indels that may or may not affect the protein structure, also the isdB gene is highly conserved (> 99% identity, Table 2).
Finally, we also analysed the conservation of csa1A, csa1B, fhuD2, and esxA, genes recently described as being promising vaccine candidates in preclinical studies [167,168]. The two genes encoding for the conserved antigen Csa (csa1A and csa1B) are present in 51.9% and 26.7% of the isolates respectively and are conserved in only a fraction of the cases (Table 2). By contrast, the iron uptake gene fhuD2 is present in all isolates, with a maximum of 1% non-synonymous variation in sequence ( Table 2). Also the genes encoding for the ESAT-6-like secretion system (esxA, esxB, esxC, esxD) are well represented in the cohort, but only esxA is present in all but one isolate and has no non-synonymous mutations in 85.1% of the isolates (Table 2). Therefore, on the basis of their conservation, both FhuD2 and EsxA appear to be promising targets for vaccine formulations.

Phylogenetics of specific STs highlights the aggressive spread of a novel independently acquired ST1 clone
We investigated the hypothesis that some of the prevalent STs could be hospital-associated clones. We estimated the ST phylogenies using a whole-genome maximum likelihood approach (see Materials and methods). In most cases, we observed that isolates in our cohort, despite sharing the same ST, SCCmec, and spa types, were not monophyletic subtrees when considering external reference genomes for the same STs. This is the case, for example, of the ST228 and ST5 clones (Fig. 2). This suggests independent acquisition of the clones and no evidence of transmission among the selected hospitalised patients, while person-to-person transmission from healthy carriers or non-selected patients cannot be ruled out [21,22]. Only two ST121 MSSA isolates were found to be almost identical and both were retrieved in the same time window from patients 096 and 098 (8 SNVs). For ST1, instead, all but two isolates belonged to the same sub-lineage, typed as SCCmecIV t127 PVL−.
We further estimated divergence times for all the 16 isolates belonging to the ST1 SCCmecIV t127 PVL− clone, including those obtained from earlier or later time points of the same patients. We used a Bayesian approach [85] (see Materials and methods) integrating all the reference genomes publicly available for ST1 and the two ST1 SCCmecV isolates from our cohort (Additional file 4: Table S3). These analyses were performed to test the hypothesis that all ST1 SCCmecIV t127 belong to a clone specific of Meyer's hospital. The relaxed exponential clock model with constant coalescent prior and GTR substitution model resulted to the most appropriate model (Additional file 6: Table S5). This model estimated that Meyer's clone has emerged approximately 6 to 28 years ago as a specific branch of the ST1 tree, which has been estimated to be 26-160 years old (Fig. 5). However, age of Meyer's clone does not match with the time of emergence of the clone in the hospital. Moreover, an isolate obtained in a recent study investigating the spread of a ST1 SCCmecIV t127 clone in Irish hospitals [169] and carrying a virulence and resistance profile very close to the one of our cohort (differences in gene presence: 2/79 and 0/18 respectively) is phylogenetically rooted inside Meyer's cluster (161 SNVs intra-cluster; 412 SNVs inter-cluster). These two findings suggest that ST1 SCCmecIV t127 is not specific of the Meyer Children's hospital but might represent a newly arising community clone that is now spreading in the nosocomial environment of different countries [169,170].

Conclusions
In this study, we investigated the epidemiology of S. aureus in different operative units of Anne Meyer's Children's University Hospital (Florence, Italy) over a timespan of 3 years by whole genome isolate sequencing. Our analyses highlighted a high diversity of STs, SCCmec, and spa-types, resulting into a wide number of clones. Some of these clones had been previously described in the literature as livestock-associated, and we described them in non-exposed children thus supporting the spreading of such clones in the non-at-risk community. We moreover described the presence of hypervirulent and geographically unusual clones, and of five STs for which no sequenced genome was available in public databases. Our refined Fig. 5 Bayesian timed tree of ST1 isolates, including reference genomes. Location and date of sample collection is reported for each isolate. For samples collected at Meyer's Children Hospital (black circles), patient code is reported instead of location. The two North Dakota samples were collected from the same subject. "n/a" indicates that no information is available for location of sample collection. Numbers at selected nodes are posterior probabilities. Grey areas are the distributions of marginal posterior probabilities for the diversification of ST1 and the diversification of Mayer-specific clone analysis of the SCCmec cassettes highlighted the presence of further resistances and diversity within the same cassette type. On the contrary, when considering single infection types or specific STs or clones as it is usual in S. aureus epidemiological studies, the genomic diversity was limited, with an increased pattern of resistance genes in chronic patients and a larger number of virulence factors in acute infections. Altogether, these observations shed more light on the complexity of S. aureus epidemiology and on the need for a more unbiased survey of the commensal and pathogenic S. aureus community, to avoid the misrepresentation of specific genomic traits. Whole-genome-based routine surveillance of S. aureus and other hospital-related pathogens would further allow to get a more unbiased idea of the rising clones and better informing clinical practices, which usually focused on the most dangerous or well-known strains. Performing such epidemiological studies as soon as a new putative nosocomial clone arises could allow us to conclude whether the new clone has arisen in that very hospital or it is a recent sub-clone spreading also in the non-hospitalised population and therefore more frequently isolated also in the clinics. These wider-focus studies would not only allow the assessment of the epidemiology of specific pathogens and clones in the hospital setting, but also the survey of the prevalence and conservation of their virulence and resistance traits. This could lead to the identification of antigens of interest for vaccine development and of specific sub-clones representing the main burden of infection, and therefore reassessing the efforts for the discovery of new treatments.
Whole genome sequencing studies are crucial to survey the global epidemiology of infectious agents, including S. aureus, as genome-based data are reproducible and can be easily meta-analysed without the confounding of batch effects. The meta-analysis of pathogenic, commensal, and environmental S. aureus isolates could lead to a deeper knowledge of the epidemiology of this bacterium and may help in understanding how to prevent and treat infections without boosting antibiotic resistance.

Additional files
Additional file 1: Table S1. Characteristics of the single isolates, including collection details, genome assembly statistics, genomic features, and results of antibiotic susceptibility testing. (XLSX 69 kb) Additional file 2: Figure S1. Pangenome analysis statistics. Figure S2.