Whole genome sequencing of ESBL-producing Escherichia coli isolated from patients, farm waste and canals in Thailand

Background Tackling multidrug-resistant Escherichia coli requires evidence from One Health studies that capture numerous potential reservoirs in circumscribed geographic areas. Methods We conducted a survey of extended β-lactamase (ESBL)-producing E. coli isolated from patients, canals and livestock wastewater in eastern Thailand between 2014 and 2015, and analyzed isolates using whole genome sequencing. Results The bacterial collection of 149 isolates consisted of 84 isolates from a single hospital and 65 from the hospital sewer, canals and farm wastewater within a 20 km radius. E. coli ST131 predominated the clinical collection (28.6%), but was uncommon in the environment. Genome-based comparison of E. coli from infected patients and their immediate environment indicated low genetic similarity overall between the two, although three clinical–environmental isolate pairs differed by ≤ 5 single nucleotide polymorphisms. Thai E. coli isolates were dispersed throughout a phylogenetic tree containing a global E. coli collection. All Thai ESBL-positive E. coli isolates were multidrug resistant, including high rates of resistance to tobramycin (77.2%), gentamicin (77.2%), ciprofloxacin (67.8%) and trimethoprim (68.5%). ESBL was encoded by six different CTX-M elements and SHV-12. Three isolates from clinical samples (n = 2) or a hospital sewer (n = 1) were resistant to the carbapenem drugs (encoded by NDM-1, NDM-5 or GES-5), and three isolates (clinical (n = 1) and canal water (n = 2)) were resistant to colistin (encoded by mcr-1); no isolates were resistant to both carbapenems and colistin. Conclusions Tackling ESBL-producing E. coli in this setting will be challenging based on widespread distribution, but the low prevalence of resistance to carbapenems and colistin suggests that efforts are now required to prevent these from becoming ubiquitous. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0471-8) contains supplementary material, which is available to authorized users.


Background
The global spread of multidrug resistant bacteria is a major threat to human health. This includes drugresistant Escherichia coli, a leading cause of bloodstream and urinary tract infections [1]. Extended spectrum β-lactamase (ESBL)-producing E. coli are capable of hydrolysing numerous antibiotics, including third-generation cephalosporins [1]. This impacts on patient outcome, since infection with ESBL-producing E. coli is associated with higher mortality, longer length of hospital stay and increased costs compared to infection with antibioticsusceptible E. coli [2]. The challenge to successful therapy has further increased following the emergence of multidrug-resistant isolates with acquired resistance to the carbapenem drugs and more recently to colistin, a drug of last resort for multidrug-resistant infections [3,4]. Transferrable colistin resistance is mediated by mcr-1 or mcr-2 carried on a plasmid, and has been detected in E. coli isolated in Europe, Africa and Asia [4,5].
Tackling multidrug-resistant bacteria requires an understanding of their reservoirs and routes of spread. E. coli are normal gut commensals for humans and other animals, including livestock, and can be isolated from food and the environment [6,7]. Studies are increasingly reporting isolation of multidrug-resistant E. coli from livestock, such as chickens, ducks, pigs and cattle in Asia (China, South Korea and Lebanon) and Europe (the UK and the Netherlands), as well as meat and the environment [8][9][10][11][12][13]. ESBL-producing E. coli can also persist in the farm environment for prolonged periods [14], and become concentrated in the surrounding environment through repeated contamination with farm wastewater [11]. Water may also provide a mechanism for the further dissemination of ESBL-producing E. coli across more extended distances. This could increase the risk of human acquisition through the consumption of contaminated drinking water [15].
The need to understand the relationship between E. coli from different reservoirs through One Health studies is well known, but requires a discriminatory bacterial typing technique. The increasing application of whole genome sequencing brings a level of discrimination and information on relatedness and resistance mechanisms that surpass previous typing methods [16]. This has been used to compare the whole genomes of environmental, commensal and pathogenic E. coli to understand their ecology and speciation [17]. Several studies have used genome sequencing to characterize multidrug-resistant E. coli with a particular focus on ST131, which has become a dominant clinical clone worldwide [18][19][20]. The use of whole genome sequencing in countries engaged in intensive livestock farming, which have been proposed to be at high risk for the emergence of drug resistance, is important. One such country is Thailand, which is a major producer and exporter of chickens. A recent study reported that nearly 78% of E. coli isolated from pig and broiler carcass samples from slaughterhouses in eastern Thailand were multidrug resistant [21]. In addition, ESBL-producing E. coli was highly prevalent in samples from healthy adults (76%), healthy pigs (77%) and broiler chickens (40%), and in water samples collected from farms (33%) and canals in central Thailand (25%) [7]. A limitation of this study was that genotyping was not performed to examine strain relatedness or define the mechanisms of resistance. Here, we describe the findings of a survey of ESBL-producing E. coli isolated from patients, canals and livestock wastewater in a defined region of eastern Thailand in which isolates were evaluated using whole genome sequencing.

Study design and bacterial isolates
The bacterial collection consisted of 149 ESBL-positive E. coli isolated between 2014 and 2015. Clinical isolates (n = 84) were from consecutive positive samples processed by the diagnostic microbiology laboratory at Bhuddhasothorn hospital, Chachoengsao province, eastern Thailand between December 2014 and April 2015. Date of isolation and sample type were recorded, and only one isolate per patient was included. Isolate details are shown in Additional file 1: Table S1. Bacterial isolates were initially identified and susceptibility testing performed using Standard Operating Procedures supplied by the Department of Medical Science, Ministry of Public Heath, Thailand and Clinical and Laboratory Standards Institute (CLSI) guidelines (M100-S24 and M100-S25), respectively. Species was subsequently confirmed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS; Biotyper version 3.1, Bruker Daltonics, Coventry, UK). Antimicrobial susceptibility testing was repeated using the N206 card on the Vitek 2 instrument (bioMérieux, Marcy l'Étoile, France) calibrated against EUCAST breakpoints, and these results used during the analysis. E-test (bioMérieux, Marcy l'Étoile, France) was used when further verification was required.
Environmental isolates (n = 65) were obtained through a cross-sectional survey between January 2015 and February 2015 described previously [22]. In brief, wastewater samples were collected from 27 canals and 11 farms within a 20 km radius of Bhuddhasothorn hospital. The farms reared pigs (n = 2), chickens (n = 6), ducks (n = 2) and both chickens and ducks (n = 1), where samples were collected from gullies that drained waste from animal housing. The geographical position of each sampling site was recorded using GPSMAP 60CSx (Garmin, Taiwan). A further two wastewater samples were taken from the Bhuddhasothorn hospital wastewater treatment system (one pre-treatment and one post-treatment water sample). Maps of the study region were created using ArcGIS software version 10.3.1.

Wastewater processing and bacterial identification
Samples were processed by filtration onto membranes as described previously [22], which were incubated on ESBL Brilliance agar (Oxoid, Basingstoke, UK) for 48 h at 35°C in air. Up to ten colonies suspected to be E. coli based on colour were picked and screened for ESBL expression using a phenotypic confirmation test based on CLSI guidelines (M100-S25). Species and antimicrobial susceptibility testing were confirmed using MALDI-TOF MS and Vitek 2, as described above. All bacterial isolates were stored at −80°C in trypticase soy broth with 20% glycerol.
Whole genome sequencing and data analysis DNA extraction, sequencing and assembly of reads were performed as described previously [23]. Sequencing was performed on an Illumina HiSeq2000. Genomes were assembled using Velvet with the improvements described previously [24]. Details of reads, numbers of contigs and N50 are provided in Additional file 2: Table S2. Multilocus sequence types (MLST) were identified from the sequence data using the Achtman scheme and an in-house script (https:// github.com/sanger-pathogens/mlst_check).
Sequence data have been deposited in the European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) under the individual accession numbers given in Additional file 2: Table S2.
The pan-genome was estimated for the 149 study genomes using Roary, with a minimum percentage identity for blastp of 90%, and an alignment created of all core genes (present in 99% of isolates) [25]. Single nucleotide polymorphisms (SNPs) in the core genes were extracted and used to construct a maximum likelihood tree using RAxML with 100 bootstraps and a midpoint root. Additionally, the study genomes were contextualised against a global collection. Sequence data for 514 E. coli isolates reported previously [26][27][28] were downloaded from the ENA and the Wellcome Trust Sanger Institute Pathogen Genomics pipeline and annotated using Prokka. The 514 global genomes were combined with the 149 study genomes and the pan-genome estimated using Roary. SNPs in the core genes were used to construct a maximum likelihood tree as above.
To place the 26 ST131 study isolates into a global context, sequence data for 319 ST131 isolates reported previously [18,19,29] were downloaded from the ENA. These, together with the 26 ST131 study genomes, were mapped to E. coli NTCC13441 (accession number LT632320-LT632321) using SMALT (http://www.sanger.ac.uk/science/tools/smalt-0). All isolates had greater than 95× coverage. Mobile genetic elements were identified and removed as described previously [30]. Recombination was removed using Gubbins [31]. SNPs were identified and used to construct a maximum likelihood tree. Four isolates from Price et al. [32] and three isolates from Stoesser et al. [33] were filtered out by Gubbins, leaving a total of 312 external and 26 study isolates in the phylogeny. FimH typing was performed by in silico PCR with previously published primers [34] and the resulting products compared to the sequences of known FimH types [35] using blastn. The H30R and H30Rx sub-clones were classified by H30Rx-specific SNPs as described previously [18] using in silico PCR.

Isolation of ESBL-producing E. coli from patients and the environment
A cross-sectional survey of canals and untreated waste from farms and a hospital in Chachoengsao province, Thailand led to the recovery of 65 ESBL-positive E. coli. Fifty-six were isolated from canals, three from pre-treated hospital wastewater and six from wastewater from four different farms ( Fig. 1). A longitudinal survey of ESBL-positive E. coli isolated from clinical samples from the microbiology laboratory at Bhuddhasothorn hospital between December 2014 and April 2015 led to the identification of 84 isolates from blood (21 isolates), urine (39), pus (23) and sputum (1). Sequence types (STs) were identified from the whole genome sequence data. Taken together, the 149 ESBLpositive E. coli isolates were assigned to 72 STs, although the clinical isolate collection (84 isolates, 30 STs) was less diverse than the environmental isolate collection (65 isolates, 53 STs). The prevalence of the eight most frequent STs identified in each of the clinical or environmental collections is shown in Additional file 3: Figure S1. ST131 was the predominant ST in the clinical collection (24/84, 28.6%), whilst no single ST predominated in the environmental collection.

Antibiotic resistance
Phenotypic antibiotic susceptibility and the presence of genes encoding antibiotic resistance were determined for all 149 ESBL-positive E. coli, the results of which are summarized in Figs. 2 and 3. All isolates were multidrug resistant (phenotypic resistance to three or more drug classes) [39]. Resistance to the carbapenem drugs was identified in two clinical isolates (resistant to ertapenem and meropenem), and one from hospital wastewater (intermediate resistance to ertapenem (MIC 0.75 μg/mL) but susceptible to meropenem (MIC 0.5 μg/mL)). The two carbapenem-resistant clinical isolates contained either New Delhi metallo-β-lactamase-1 (NDM-1; ST44) or NDM-5 (ST46), and the hospital wastewater isolate contained Guiana extended-spectrum β-lactamase-5 (GES-5; ST1585) (Fig. 3a).
The genetic basis for ESBL production is shown in Fig. 3. A total of 149 CTX-M elements identified in 147 isolates were resolved into six different elements, with CTX-M-55 (40.9%) the most common (Fig. 3b). SHV-12 type β-lactamase was present in three isolates (one clinical and two environmental), one of which was also positive for CTX-M-14. CTX-M-55 was the most prevalent   Fig. 3c and a phylogeny of the CTX-M genes is shown in Additional file 3: Figure S2. Two variants of CTX-M-14 were identified that differed by one SNP and corresponded to CTX-M-14-1 (n 31; GenBank accession number JF701188) and CTX-M-14-48 (n 17; GenBank accession number AJ416341), respectively. In addition, two clinical isolates and one environmental isolate contained two different elements (Additional file 1: Table S1). Study isolates were screened for the presence of mcr-1 and mcr-2 encoding resistance to colistin, which detected mcr-1 in three isolates (one clinical and two environmental from independent canals; Fig. 3a). Phenotypic resistance to colistin was confirmed using the E-test; MIC values of the three isolates were 4, 3 and 8 μg/mL.

Phylogenetic analysis of ESBL-E. coli from Thailand
A maximum likelihood tree of the 149 ESBL-positive E. coli was created based on 227,154 SNPs in the 2682 core genes (Fig. 3a). The majority of clinical and environmental isolates resided on distinct branches of the phylogenetic tree, with environmental isolates showing a greater level of diversity than the clinical isolates. However, 11/72 STs contained both clinical and environmental isolates, and a pairwise comparison of SNPs in the core genes revealed that these were closely related (3-23 SNPs, median 9 SNPs) in 6/11 STs (ST131, ST2003, ST354, ST38, ST405 and ST410). Three pairs of clinical-environmental isolates differed by no more than five SNPs, whilst an additional 18 pairs of isolates differed by less than 25 SNPs. The environmental isolates were all from canals within 0.5 to 5.2 km of the hospital (median 2.3 km) with the exception of one isolate, which was taken from the hospital wastewater treatment system and was nine SNPs different from a clinical isolate.
The phylogeny of the 149 study genomes was contextualized by combining these with 514 E. coli genomes from a global collection. A maximum likelihood tree was created based on 222,900 SNPs in the 1983 core genes identified across the 663 isolates (Additional file 3: Figure S3). Thai E. coli isolates were distributed across the E. coli species population, suggesting a high level of diversity in the Thai E. coli population. Of note,   (Fig. 4). The majority of Thai ST131 E. coli from this study belonged to two clusters (cluster 1 (n = 19) and cluster 2 (n = 4)). Cluster 1 belonged to H30R and contained isolates from Laos (n = 9), Cambodia (n = 3), Australia (n = 2), the USA (n = 2), Canada (n = 1), Germany (n = 1) and Taiwan (n = 1), interspersed with the 19 study isolates and two isolates from a previous study [29] from Thailand. An analysis of pairwise SNP differences revealed 14- ). Scale bar indicates 100 SNPs group. The remaining isolates in cluster 1 that did not belong to one of these two groups were a CTX-M-negative isolate from Taiwan and a single isolate from Canada that was CTX-M-14-positive and was located basal to the whole cluster. Cluster 2 consisted of four isolates from Thailand and belonged to the H41 sub-clone, which contained isolates from Laos (n = 8), Thailand (n = 7), the United Kingdom (n = 7), the USA (n = 5), Cambodia (n = 2), Australia (n = 2), Canada (n = 2), New Zealand (n = 1), Spain (n = 1) and Taiwan (n = 1) (Fig. 4). The four Thai isolates residing in cluster 2 contained CTX-M-27 alone (three isolates), or both CTX-M-27 and CTX-M-55 (one isolate).

Characterisation of mcr-1 and carbapenamase-encoding plasmids
We investigated the genetic context of the colistin and carbapenemase resistance genes detected in the study isolates by comparing the contigs containing these genes to the GenBank database. The highest matches to the contigs containing mcr-1 were equal matches to E. coli plasmids pECJS-59-244, pS38 and EC2-4 for one of the canal isolates; and S. enterica subsp. diarizonae strain 11-01854 and 11-01853 plasmids for the clinical isolate and the second canal isolate (Additional file 4: Table S3). The three highly related E. coli plasmids have been reported previously as carrying the mcr-1 gene, and were isolated in China, Switzerland and Malaysia. Additionally, it has been reported that mcr-1 in three E. coli from commercial farms in China were carried on plasmids with a similar backbone to plasmid 11-01854 [40]. The contig containing NDM-1 was best matched to E. coli plasmid pNDM-ECS01, which has been described previously as carrying NDM-1 in ST131 E. coli and Klebsiella pneumoniae in Thailand [41]. The contig containing NDM-5 had equal matches to E. coli plasmids pC06114_1 and pGUE-NDM, which carry NDM-1, and K. pneumoniae plasmids pCC1409-1 and pCC1410-1, which carry NDM-5 and were identified in Korea after transfer of a patient from the United Arab Emirates [42]. Finally, the contig containing GES-5 was best matched to E. coli plasmid pHKU1, which was described in the context of fosfomycin resistance in E. coli and to our knowledge has not been reported previously to carry GES-5. However, only 40% of the contig matched plasmid pHKU1, indicating differences in the genetic content (Additional file 4: Table S3). Overall these results indicate that plasmids carrying mcr-1 in Thailand may be globally disseminated, and those carrying carbapenamase resistance genes could have originated from other countries in Asia or the Middle East. The finding of mcr-1 on a potentially novel plasmid indicates further dissemination of mcr-1 in the plasmid population.

Virulence genes
Of the 284 virulence genes investigated, 153 were detected at least once in the 149 study isolates. Analysis of the distribution of virulence factors across STs indicated that there was a higher prevalence of virulence genes in ST131. To further examine this, we compared the prevalence of each virulence factor in ST131 (n = 26) and non-ST131 (n = 123). A total of 36 virulence genes were significantly more common in ST131 (all < 0.0001 except for fim genes, p < 0.05) (Additional file 3: Figure S4); 35 of these associations have not been reported previously (the exception being sat) [43][44][45][46]. Twelve virulence genes were significantly less common in ST131 isolates (p value < 0.05), including four genes (entD, espX1, espX5 and espL1) present in over 79% of non-ST131 isolates but absent in ST131 isolates (Additional file 3: Figure S4).

Discussion
This study has demonstrated that Thai ESBL-positive E. coli isolates are genetically diverse, with isolates dispersed throughout a phylogenetic tree containing a global collection of E. coli. Thai isolates cultured from the environment were more genetically diverse than those from Thai patients, and a core genome comparison indicated that ESBL-positive E. coli from humans and the environment (including farms) were broadly distinct, although there was evidence for some highly related bacterial pairs. All of the Thai ESBL-positive E. coli isolates were multidrug resistant, including high rates of resistance to aminoglycosides, fluoroquinolones and trimethoprim. A small minority of isolates were also resistant to the carbapenem drugs (encoded by NDM-1, NDM-5 or GES-5) or to colistin (encoded by mcr-1), but no isolates were resistant to both the carbapenem drugs and colistin. Of note, isolates resistant to the carbapenem drugs were only isolated from clinical samples or a hospital sewer, while isolates resistant to colistin were identified from both clinical samples and canal water. The isolation of NDM-1-positive E. coli from Thailand has been reported previously from urine samples cultured from patients with urinary tract infection at Srinagarind Hospital, Khon Kaen province in 2011 [47]. To our knowledge, this is the first report of both NDM-5 and GES-5 in Thailand, but NDM-5-positive E. coli has been isolated from humans in numerous countries, including the United Kingdom in 2011 [48], and in China, Algeria, Japan, USA, Denmark, Australia, Spain, Egypt and South Korea between 2014 and 2016 [49][50][51][52][53][54][55][56]. NDM-5-positive E. coli has also been identified from a domestic animal in Algeria [57]. GES-5-positive E. coli was first documented in Greece in 2004 in a clinical isolate with low level resistance to carbapenem drugs [58]. The first report of mcr-1-positive E. coli was from China in 2015 but there is increasing evidence that the mcr-1 gene had been circulating in Asia prior to this, including in Thailand based on the retrospective identification of mcr-1-positive E. coli that was originally isolated in 2011 [4,37]. Blast comparisons performed during our study of contigs containing mcr-1 and genes encoding carbapenem resistance were consistent with this being plasmid-mediated. Genetic characterisation of CTX-M genes was notable for the diversity in elements identified, with six elements identified but with CTX-M-55 being the most common and present in clinical and environmental isolates. By contrast, the proportion of other elements (for example, CTX-M-15 and CTX-M-27) varied considerably between human and environmental isolates.
E. coli ST131 was the predominant clone associated with human disease in Thailand and accounted for almost one-third of isolates, although was uncommonly isolated from the environment. This lineage is frequently multidrug-resistant, has become globally disseminated and is a well-known cause of human disease [59,60]. We identified 36 virulence factors that were over-represented in ST131 compared with non-ST131 isolates. This comparison should be interpreted with caution since this was underpowered (26 ST131 isolates versus 123 non-ST131 isolates), and the comparison of a single clone with a genetically heterogeneous second group is potentially problematic. However, differences in gene contents between ST131 and other lineages is consistent with previous studies, which have reported over-representation in ST131 of genes associated with adhesion (afa/dra, fimH, papA and iha,), invasin (ibeA), iron acquisition (fyuA and iutA), toxins (sat and aer), protectins (kpsMII, kpsMIII, K2, K5 and ompT) and other functions (usp, traT and malX) [43][44][45][46]61].
Our findings add to the increasing body of evidence that ESBL-positive E. coli is ubiquitous in Thailand. A recent study reported that ESBL-producing E. coli could be isolated from three-quarters of healthy Thai people tested and was also common in livestock, which suggests that this has become established in humans and their food chain [7]. There is also direct evidence for human consumption of food containing ESBL-producing E. coli, which was isolated from pre-cooked and cooked meat collected from markets in Bangkok, Thailand in 2012-2013 [7]. Furthermore, there is evidence for global dissemination since nearly 20% of a sample consisting of a range of fresh vegetables imported into Switzerland from Thailand in 2014 was positive for ESBL-producing E. coli [62]. This is having an impact on human health in Thailand based on the findings of a retrospective study conducted in nine public hospitals in northeast Thailand [63]. The proportions of community-acquired E. coli bacteraemia caused by E. coli non-susceptible to extended-spectrum cephalosporins rose from 5 to 23% between 2004 and 2010, while the proportions of healthcare-associated and hospital-acquired E. coli bacteraemia caused by E. coli non-susceptible to extended-spectrum cephalosporins were high (44 and 52%, respectively) with no significant change over time [63]. Mortality was higher in patients with multidrug-resistant E. coli bacteraemia compared with non-multidrug-resistant E. coli bacteraemia [63], indicating the human cost of this problem.

Conclusions
Our genome-based study adds to the epidemiological and phenotypic data for ESBL-positive E. coli in Thailand. There was some evidence for highly related E. coli between infected patients and their immediate environment, and we identified a plethora of genes encoding drug resistance to broad-spectrum antibiotics used to treat Gram-negative infections. The extent to which ESBL-positive E. coli is distributed in humans, the environment and the food chain, the diversity of elements encoding ESBL and the epidemiological rise in the proportion of multidrug-resistant E. coli causing human disease indicates that this has gathered pace over the past decade. ESBL genes likely provide a fitness advantage to E. coli belonging to numerous different lineages, both in the environment and in clinical settings. Furthermore, these genetic elements are likely to be present in a range of additional Gram-negative bacterial species. Reversing the clock on ESBL in this setting may prove extremely difficult, but our finding of a comparatively low prevalence of mobile genes encoding resistance to the carbapenem drugs and colistin suggests that efforts are now required to prevent these from also becoming ubiquitously distributed.

Additional files
Additional file 1: Table S1. Isolate details. (XLSX 15 kb) Additional file 2: Table S2. Sequence quality metrics. (XLSX 495 kb) Additional file 3: Figure S1. Predominant STs of ESBL-producing E. coli isolated from clinical and environmental origins in Thailand. Graph showing the prevalence of the eight most frequently identified STs among each of the clinical or environmental Thai ESBL-producing E. coli isolates, split by source of isolation. The total adds up to 12 since four of the top eight most frequent in each category did not overlap. The presence of all isolates in each of the 12 STs is shown. Figure S2. Phylogeny of CTX-M genes identified in Thai ESBL-producing E. coli. Phylogeny of the seven CTX-M gene variants identified in the 149 study genomes. Figure S3. Phylogeny of Thai ESBL-producing E. coli in a global context. Maximum-likelihood tree of the 149 study genomes and 514 global genomes based on 222,900 SNPs in the 1983 core genes. Inner ring shows origin of isolate, middle ring shows country of isolation and outer ring indicates the study isolates (black). Scale bar indicates~20,000 SNPs. Figure S4. Virulence gene comparison between ST131 and non-ST131 isolates. Graph showing the prevalence of virulence factors in ST131 (n 26) compared to non-ST131 (n 123) in the study isolates. Using Fisher's exact test at a p value of 0.05, only those factors overrepresented between the two groups are shown. (DOCX 2381 kb) Additional file 4: Table S3. Genetic context of mcr-1 and carbapenem resistance genes. (XLSX 9 kb)