Clinical metagenomic identification of Balamuthia mandrillaris encephalitis and assembly of the draft genome: the critical need for reference strain sequencing

Primary amoebic meningoencephalitis (PAM) is a rare, often lethal cause of encephalitis, for which early diagnosis and prompt initiation of combination antimicrobials may improve clinical outcomes. In this study, we present the first draft assembly of the Balamuthia mandrillaris genome recovered from a rare survivor of PAM, in total comprising 49 Mb of sequence. Comparative analysis of the mitochondrial genome and high-copy number genes from 6 additional Balamuthia mandrillaris strains demonstrated remarkable sequence variation, with the closest homologs corresponding to other amoebae, hydroids, algae, slime molds, and peat moss,. We also describe the use of unbiased metagenomic next-generation sequencing (NGS) and SURPI bioinformatics analysis to diagnose an ultimately fatal case of Balamuthia mandrillaris encephalitis in a 15-year old girl. Real-time NGS testing of a hospital day 6 CSF sample detected Balamuthia on the basis of high-quality hits to 16S and 18S ribosomal RNA sequences present in the National Center for Biotechnology Information (NCBI) nt reference database. Retrospective analysis of a day 1 CSF sample revealed that more timely identification of Balamuthia by metagenomic NGS, potentially resulting in a better outcome, would have required availability of the complete genome sequence. These results underscore the diverse evolutionary origins underpinning this eukaryotic pathogen, and the critical importance of whole-genome reference sequences for microbial detection by NGS.

1 4 1 previous report , we found that the mitochondrial genome of 1 4 2 strain V451 was the most divergent among tested strains, and possessed an 1 4 3 additional 1,149 bp ORF downstream of the cox1 gene that did not align 1 4 4 significantly to any sequence in the NCBI nt or nr reference database. containing intron in the cox1 gene, whereas no such intron was present in the 1 4 9 remaining 3 strains. Two of the remaining 3 strains, strains V451 and V188, 1 5 0 . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint instead had an approximately 790 bp insert in the 23S rRNA gene ( Figure 1A, 1 5 1 "rnl RNA") that contained a 530 bp or 666 bp LAGLIDADG-containing ORF, 1 5 2 respectively, and that coded for a putative endonuclease. The LAGLIGDADG-1 5 3 containing endonucleases in the 2 strains shared 84% amino acid pairwise 1 5 4 identity with each other, but ~50% amino acid identity to a corresponding Acanthamoeba castellani, and <12% amino acid identity to the LADGLIDADG-  The ORF containing the rps3 gene, found to contain a possible rps3 intron sequencing of this locus using conserved outside primers revealed that each 1 6 5 strain tested had a unique length and sequence (Fig. 2), raising the possibility of 1 6 6 targeting this region for Balamuthia mandrillaris strain detection and genotyping. Because of the high-copy number of mitochondrial sequences in Balamuthia, as noted previously for Naegleria fowleri (Herman et al. 2013), we  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ;https://doi.org/10.1101/024455 doi: bioRxiv preprint Assembly of the remaining 4.4 million high-quality reads yielded 31,194 BLASTn alignment of the scaffolds to the National Center for Biotechnology Information (NCBI) nt database revealed that the most common organism (2,067/14,699 = 14.1% of scaffolds), nearly entirely due to low-complexity 1 8 2 sequences, followed by high-significance hits to Acanthamoeba castellani   The 18S-28S rRNA locus in the Balamuthia mitochondrial genome corresponded 1 8 7 to a 12.5 kB contig sequenced at high coverage (>400X over rRNA regions). with only 68.5% pairwise identity to its closest phylogenetic relative, 1 9 2 Acanthamoeba castellani (Fig. 3B). From the nuclear genome, one high-copy 1 9 3 contig contained a truncated 5,250 nucleotide ORF exhibiting only 33% amino 1 9 4 acid identity to Rhizopus delemar (pin mold), and harboring elements consistent 1 9 5 with a retrotransposon (Kordis 2005), including an RNAse HI from Ty3/Gypsy 1 9 6 . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint family retroelements, a reverse transcriptase, a chromodomain, and a 1 9 7 retropepsin. Two high-copy, ~1,600 bp ORFs that failed to match any sequence 1 9 8 by BLASTx alignment to the NCBI nr protein database were found to align 1 9 9 significantly to Escherichia coli site-specific recombinase by remote homology 2 0 0 HMM analysis. presented to a community emergency room with 7 days of progressive symptoms 2 0 8 including right arm weakness, headache, vomiting, ataxia, and confusion. Her diabetes was well controlled with an insulin pump, and she did not take any 2 1 0 additional medications. Exposure history was significant for contact with alpacas 2 1 1 at a family farm and swimming in a freshwater pond nine months prior. She had 2 1 2 no international travel, sick contacts, or insect bites. She was given 10 mg 2 1 3 dexamethasone with symptomatic improvement in her headaches, but was 2 1 4 subsequently transferred to a tertiary care children's hospital after a computed 2 1 5 tomography scan revealed left occipital and frontal hypodensities. On HD 1, peripheral white blood count was 11.6x10 3 cells/μL (89% 2 1 7 neutrophils, 6% lymphocytes, 4% monocytes), erythrocyte sedimendation rate 2 1 8 was 13 mm/hr [normal range 0-20 mm/hr], C-reactive reactive was 3 mg/dL 2 1 9 [normal range 0-1 mg/dL], and procalcitonin 0.05 ng/mL [normal range 0-0.5 2 2 0 . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under scan of the brain on HD 1 showed hemorrhagic lesions with surrounding edema  Given the patient's autoimmune predisposition and hemorrhagic 2 2 9 appearance of the brain lesions, acute hemorrhagic leukoencephalitis was 2 3 0 initially suspected and intravenous methylprenisolone (1,000 mg daily) was given initiated. On HD 6, she underwent craniotomy for brain biopsy, revealing partially . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint metagenomic NGS testing of CSF and brain biopsy samples confirmed the however this medication did not arrive in time to administer before the patient 2 5 0 died. Autopsy was not performed according to the wishes of the family. Metagenomic NGS and SURPI bioinformatics analysis were used to 2 5 4 analyze the patient's HD 6 CSF and brain biopsy for potential pathogens. or misannotated sequences (Table S1), while most of the bacterial reads 2 5 7 mapped to common skin / environmental contaminants such as backbone) eukaryotic reads that were taxonomically assigned at a species level 2 6 1 from the RNA library were assigned to available 16S and 18S sequences of Balamuthia mandrillaris in the NCBI nt reference database   . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under We then sought to determine in retrospect whether earlier detection and feasible. Metagenomic NGS of a day 1 CSF sample followed by SURPI analysis 2 7 5 using the June 2014 NCBI nt reference database generated no sequence hits to 2 7 6 Balamuthia ( Fig. 5B; Table S1). However, repeating the analysis after adding the Importantly, 9 species-specific DNA reads were detected from day 1 CSF (Fig.  Table 3). Although only 2 of 9 putative Balamuthia reads had 2 8 1 identifiable translated nucleotide homology to any protein in the NCBI nr 2 8 2 database, one of those reads was found to share 77% amino acid identity to the 2 8 3 gluathione transferase protein from Acanthamoeba castellani, and hence most 2 8 4 likely represented a bona fide hit to Balamuthia. These findings also indicated 2 8 5 that the detection of Balamuthia reads was not due to errors in the draft genome 2 8 6 assembly from incorporation of contaminating sequences from other organisms. Thus, detection of Balamuthia from the patient's day 1 sample and a more timely . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint 0 copy number 28S rRNA gene that can be leveraged for the future development in cases of Balamuthia might lead to improved outcome . Glaser, submitted), are now available. CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint probes, but rather, detects any and all pathogens on the basis of uniquely 3 1 3 identifying sequence information (Chiu 2013). Rapid and accurate bioinformatics outbreak investigation (Briese et al. 2009;Greninger et al. 2015b).

2 4
In the field of amoebic encephalitis, draft genomes are now available for 3 2 5 Acanthamoeba castellani (Burger et al. 1995), Naegleria fowleri (Zysset-Burri et California, revealing that geographic differences likely exist among strains ( Fig.   3 3 2 1B). This study also identified a unique locus in a putative rps3 intron/intergenic 3 3 3 in the mitochondrial genome that is an attractive target for a clinical genotyping . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Limitations to this study include the small number of accessible clinical  The use of long read technologies based on single molecular, real-time (SMRT) 3 4 2 or nanopore sequencing will likely be needed to achieve a highly contiguous,  In summary, we demonstrate here that the availability of pathogen 3 4 7 reference genomes is critical for the sensitivity and success of unbiased disease. In hindsight, more timely and potentially actionable diagnosis at sequencing-based efforts for diagnosis and surveillance, and can be used to   . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint  (Ambion). Total RNA was extracted from 2 mm 3 brain biopsy tissue using the using the Oligotex mRNA Mini Kit (Qiagen). Total RNA from CSF and mRNA 3 7 5 from brain biopsy was reverse-transcribed using random hexamers and randomly 3 7 6 amplified as previously described (Greninger et al. 2015b DNase) was used as input into Nextera XT, following the manufacturer's protocol CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint beads, quantitated on the BioAnalyzer (Agilent), and run on the Illumina MiSeq (1 3 8 2 x 160 bp run). Metagenomic NGS data were analyzed for pathogens via SURPI 3 8 3 using NCBI nt/nr databases from June 2014 ).

8 4
A rapid taxonomic classification algorithm based on the lowest common 3 8 5 ancestor was incorporated into SURPI, as previously described (Greninger et al.  the antibiotics 100 U/mL penicillin, 0.1 mg/mL streptomycin, and 0.25 µg/mL 3 9 7 fungizone and incubated at 37° CO 2 for 7 to 10 days plus 2 days at room   . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint Bacto-casitone axenic medium and allowed to grow for another 7-10 days, after 4 0 5 which the amoebas were concentrated again, washed, and placed into fresh 4 0 6 axenic medium. After a final centrifugation step, the amebae were collected, 4 0 7 washed 3X in PBS, pelleted, and stored at -80° C.  the Nextera XT library was sequenced on an Illumina HiSeq (2x250bp paired-end 4 2 0 sequencing) ( Table 2). Mate-pair reads from run MP1 were adapter-trimmed with high-copy number contigs were assembled using SPAdes v3.5 (Bankevich et al. . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint 0 Glimmer gene predictor, and all predicted ORF sequences were confirmed using 4 2 8 BLASTx and HHPred (Altschul et al. 1990;Soding et al. 2005).

2 9
Reads from runs MP2 and MP3 were mate-pair adapter-trimmed using 4 3 0 NxTrim, while reads from all runs were quality-filtered (q30) and adapter-trimmed 4 3 1 using cutadapt (Martin 2011). Reads that aligned to the Balamuthia 4 3 2 mitochondrial genome and golden hamster (Mesocricetus auratus) were 4 3 3 identified using SNAP (Zaharia et al. 2011) and removed prior to de novo 4 3 4 assembly using platanus (Kajitani et al. 2014). Any scaffold of length less than   CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint 1 x 2 min. PCR amplicons were visualized by 3% agarose gel electrophoresis.  This study was supported by grants from the National Institutes of Health mandrillaris. An outgroup (e.g. Acanthamoeba castellani) is not shown given the   BLASTn E-score. Sequences were aligned using MUSCLE and a phylogenetic 5 1 8 . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint tree constructed using MrBayes. Branch lengths are drawn proportionally to the additional rim-enhancing lesions in multiple regions (right panel, white arrows).  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint sequences from Balamuthia (16S/18S rRNA genes) in the NCBI nt reference 5 4 2 database as of August 2015 and prior to sequencing of the draft genome.

4 3
Shown are coverage maps corresponding to day 6 DNA and RNA libraries from 5 4 4 CSF and a day 6 mRNA library from brain biopsy. No hits to 16S and 18S

4 5
Balamuthia sequences were seen from day 1 samples. The asterisk denotes an Balamuthia in the day 6 CSF DNA library that are identified by SURPI after the 5 5 6 draft genome sequence is added to the reference database (versus only 13 hits 5 5 7 previously). . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ;https://doi.org/10.1101/024455 doi: bioRxiv preprint  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ;https://doi.org/10.1101/024455 doi: bioRxiv preprint  . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint C B A . CC-BY-NC-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 12, 2015. ; https://doi.org/10.1101/024455 doi: bioRxiv preprint