Skip to main content

Metagenomic assessment of gut microbial communities and risk of severe COVID-19



The gut microbiome is a critical modulator of host immunity and is linked to the immune response to respiratory viral infections. However, few studies have gone beyond describing broad compositional alterations in severe COVID-19, defined as acute respiratory or other organ failure.


We profiled 127 hospitalized patients with COVID-19 (n = 79 with severe COVID-19 and 48 with moderate) who collectively provided 241 stool samples from April 2020 to May 2021 to identify links between COVID-19 severity and gut microbial taxa, their biochemical pathways, and stool metabolites.


Forty-eight species were associated with severe disease after accounting for antibiotic use, age, sex, and various comorbidities. These included significant in-hospital depletions of Fusicatenibacter saccharivorans and Roseburia hominis, each previously linked to post-acute COVID syndrome or “long COVID,” suggesting these microbes may serve as early biomarkers for the eventual development of long COVID. A random forest classifier achieved excellent performance when tasked with classifying whether stool was obtained from patients with severe vs. moderate COVID-19, a finding that was externally validated in an independent cohort. Dedicated network analyses demonstrated fragile microbial ecology in severe disease, characterized by fracturing of clusters and reduced negative selection. We also observed shifts in predicted stool metabolite pools, implicating perturbed bile acid metabolism in severe disease.


Here, we show that the gut microbiome differentiates individuals with a more severe disease course after infection with COVID-19 and offer several tractable and biologically plausible mechanisms through which gut microbial communities may influence COVID-19 disease course. Further studies are needed to expand upon these observations to better leverage the gut microbiome as a potential biomarker for disease severity and as a target for therapeutic intervention.


Over 670 million individuals worldwide have been infected with SARS-CoV-2 and developed coronavirus disease-2019 (COVID-19), culminating in nearly 7 million lives lost [1]. The gut microbiome is a critical modulator of host immunity [2] and affects the immune response to respiratory viral infections (e.g., influenza A virus subtype H1N1, severe acute respiratory syndrome [SARS], and Middle East respiratory syndrome) [3,4,5,6]. Several early studies have explored the link between broad alterations in gut microbial communities and COVID-19, demonstrating the generalized enrichment of opportunistic pathogens and depletion of commensals [7,8,9,10,11,12,13,14,15,16,17,18].

Most prior studies have largely focused on the presence, absence, or the differential abundance of specific microbes in COVID-19 [7, 9,10,11,12,13,14,15,16, 19, 20], and few have interrogated microbial network dynamics to identify which co-occurring or co-excluded species are foundational to maintaining microbial homeostasis. This represents a missed opportunity to identify potential bacterial targets to restore a more favorable, health-promoting gut configuration. Similarly, other studies have not considered how these shifts might influence gut metabolite pools. Finally, prior studies interested in exploring the gut microbiome in COVID-19 have largely sought to characterize the differences in healthy controls compared to infected patients rather than those with moderate compared to severe disease [7, 10,11,12, 14, 16]. Establishing a predictive biomarker of disease severity may improve early identification of at-risk patient populations that require immediate intervention or those that are more likely to benefit from effective antiviral therapies [21].

It remains unclear what role the gut microbiome plays in regulating the severity of COVID-19 in hospitalized patients and what specific microbially-mediated mechanisms may underlie this relationship. To address these questions, we conducted a study of hospitalized patients with COVID-19 at a US tertiary medical center. Using metagenomic profiling of fecal samples collected from these patients, we demonstrate significant depletions of Fusicatenibacter saccharivorans and Roseburia hominis in severe COVID-19, reductions of which have previously been linked to post-acute COVID-19 syndrome (PASC) or long COVID [18, 22]. Strikingly, we observed these declines during patients’ index hospitalizations, suggesting the presence of an early microbial signal that may predict the development of a long-term complication. We further use network analysis to identify significant changes in microbial co-occurrence networks in severe COVID-19 and perform complementary predicted metabolite analyses to further link these changes to alterations in bile acid pool and short-chain fatty acid (SCFA) levels, offering biologically plausible mechanisms to explain the link between gut microbial communities and COVID-19 disease severity.


Study population

From April 2020 to May 2021, we prospectively enrolled 127 consecutive hospitalized patients aged ≥ 18 years with confirmed COVID-19 at the Massachusetts General Hospital to a longitudinal COVID-19 disease surveillance study. Patients were categorized as having severe COVID-19 if they required admission to the intensive care unit with acute respiratory failure (the need for oxygen supplementation ≥ 15 L per minute (LPM), non-invasive positive pressure ventilation, or mechanical ventilation) or other organ failures (such as shock requiring vasopressor initiation) [23]. Otherwise, they were categorized as having moderate COVID-19. Patients were screened daily for inclusion from among all admitted individuals for whom a designation of possible SARS-CoV-2 infection was flagged by hospital infection control. COVID-19 infection status was subsequently confirmed with at least one positive nasopharyngeal SARS-CoV-2 polymerase chain reaction (PCR) test. An optional biospecimen collection protocol was nested within this longitudinal study, which allowed collection of additional clinically relevant biospecimens, including stool samples. All consecutive eligible consenting patients were included.

Sample/data collection

Fresh stool was collected and refrigerated at 4℃ until aliquoting/freezing at − 80℃ (typically within 4 h of collection) from adult patients enrolled in the prospective biospecimen collection study (241 samples from 127 admitted patients). Participants were able to provide stool samples as frequently as once daily, as well as declining donation on any given day (while remaining in the study). Study coordinators blinded to case status abstracted data from the electronic health record using a double data entry approach with discrepancies adjudicated by re-abstraction or after discussions with supervising authors. We collected information on admission age (years), biological sex (male, female), race (White, Black, Asian, American Indian, Mixed, or Other), ethnicity (non-Hispanic or Hispanic), admission BMI (kg/m2), comorbidities including history of cancer, pulmonary, or cardiac disease, hypertension, hyperlipidemia, and diabetes mellitus (each yes/no), smoking history (active, former, never, unknown, and pack-years among smokers), and their composite admission Charlson Comorbidity Index, a validated score predictive of in-hospital mortality [24]. Information on hospital course, including admission Simplified Acute Physiology Score II (SAPS II) [25] and Sequential Organ Failure Assessment (SOFA) scores [26] were calculated from routine laboratory results and clinical assessments. The use of antibiotics, antivirals including remdesivir, hydroxychloroquine, corticosteroids, anti-IL-6 therapy, any form of oxygen support, high-flow oxygen, bilevel positive airway pressure (BiPAP) ventilation, or mechanical ventilation (each yes/no) was collected. Mortality within 90 days of admission was ascertained in the post-study period.

Extraction protocols

Stool samples, reagent-only negative controls, and mock community positive controls (Zymo Research) were extracted using either the AllPrep PowerFecal DNA/RNA 96 Kit (Qiagen) or the Maxwell HT 96 gDNA Blood Isolation System (Promega) [27]. SARS-CoV-2 viral load was quantified as per CDC guidelines [28] using the 2019-nCoV N1 primer and probe set [28], as well as human RNaseP as an internal control. Each RT-qPCR reaction contained TaqPath™ 1-Step RT-qPCR Master Mix (Thermo Fisher), RNA template, the CDC N1 or RNaseP forward and reverse primers (IDT), probe, and RNase-free water to a total reaction volume of 10 μl. Viral copy numbers were quantified using N1 quantitative PCR (qPCR) standards (IDT) in tenfold dilutions to generate a standard curve. The assay was run in triplicate for each sample with three no-template control wells per 384 well plate.

Microbial sequencing

Samples were sequenced by two metagenomic sequencing facilities at the Broad Institute and Baylor College of Medicine according to their standard established platforms. DNA was prepared for sequencing using the Illumina Nextera XT DNA library preparation kit. All libraries were sequenced with a target of 3 GB output at 2 × 150 bp read length using the Illumina NovaSeq platform.

Sequence bioinformatics

Taxonomic and functional profiles from both locally recruited patients and publicly available sequences from our external validation cohort [19] were generated using the bioBakery 3 shotgun metagenome workflow 3.0.0, the details of which have previously been described [29]. Briefly, human reads were filtered using KneadData 0.10.0 and taxonomic profiles generated using MetaPhlAn 3.0.0 [30]. Functional profiling was conducted using HUMAnN 3.0.0 [30], resulting in gene family abundance tables assembled into higher order MetaCyc pathways [31].

Given the tight coupling and relatively conserved nature of gut taxonomic and metabolite profiles [32], we used the MelonnPan-predict 0.99.023 workflow [33] to interrogate the functional relationship between COVID-19 severity and microbial community metabolism. In brief, MelonnPan uses an elastic net model to conservatively predict putative metabolite levels based on stool UniRef90 gene family abundance.

Statistical analysis

To compare patient characteristics between study groups, we used standard statistical tests, including chi-squared (χ2) tests or Fisher’s exact testing for categorical variables, the Student’s t-test for normally distributed, non-categorical variables, and nonparametric Wilcoxon rank sum tests for all others. Differences with two-tailed p-value ≤ 0.05 were considered significant.ɑ-diversity was calculated using the Shannon index with the “diversity” function from the R package vegan [34]. Principal coordinates analyses (PCoA) were performed using species-level Bray–Curtis dissimilarity metrics with the “vegdist” function in the vegan package.

After filtering out features with no variance and low (< 10%) prevalence, we performed differential abundance testing of species-level taxonomy, MetaCyc pathways, and predicted stool metabolites using linear mixed-effects models to account for a nested data structure from repeated sampling of non-independent samples:


Machine learning model building and evaluation were conducted using the SIAMCAT v.1.13.3 package [35]. Log-transformed species with pseudocount were filtered to remove biomarkers with low overall abundance and z-transformed. A nested cross-validation procedure was applied to calculate prediction accuracy by splitting data into training and testing sets for twice-repeated, fivefold cross-validation. To account for longitudinal sampling [35], data splits were stratified by participant ID, ensuring samples from the same individual were used in the same fold. For each split, a random forest (RF) regressor was trained and subsequently used to predict COVID-19 disease severity. To evaluate model performance, we used the lambda parameter to maximize the area under the receiver operator characteristic curve (AUROC) with a 95% confidence interval (CI) for cross-validation error. We used the make.predictions function of SIAMCAT to assess model performance on our external validation dataset.

To assess whether ecological dynamics may help explain observed differences in taxonomy, we performed dedicated microbial network analyses. To account for our longitudinal data structure and the non-independence of longitudinal samples from the same individual, we restricted this analysis to each participant’s first collected stool (all other analyses used the entire dataset). Network construction was conducted using the “netConstruct” function in NetCoMi v.1.0.2 [36], normalized using a modified centered-log ratio and limited the resulting network to microbes with an absolute Pearson correlation ≥ 0.4 (approximately equal to the 95th percentile of correlation matrix distribution). Network hubs were identified as those in the top quintile of degree, betweenness, and closeness centrality in each network (moderate vs. severe COVID-19, respectively). Finally, comparison of moderate and severe networks was performed using the “netCompare” function with 10,000 permutations.


Participant characteristics and overall gut community structure

We enrolled 127 hospitalized COVID-19 patients. 79 (62.2%) had severe disease and 48 (37.8%) had moderate disease. Collectively, they provided 241 stool samples (Fig. 1a, Additional file 1: Fig. S1). While BMI was higher in the severe group, there were no statistically significant differences observed between severity groups based on age, sex, race, ethnicity, various comorbidities, and smoking history (Additional file 1: Table S1). Patients with severe COVID-19 had a higher mean body mass index (BMI) as well as Simplified Acute Physiology Score II (SAPS II) [25] and Sequential Organ Failure Assessment (SOFA) scores [26], each a validated clinical assessment tool to risk stratify hospitalized patients’ risk of mortality [37, 38]. Severe COVID-19 patients more frequently received antibiotics, antivirals, and ICU therapies. Patients with severe COVID-19 had higher 90-day mortality compared to those with moderate disease (22.8% vs. 4.2%, p-value = 0.01).

Fig. 1
figure 1

Study overview and overall community structure. a Study enrollment of hospitalized patients with confirmed COVID-19 with weekly stool sampling until the time of discharge or death, whichever occurred first. b Marked reduction in species richness and evenness in severe COVID-19 (inverse Simpson ɑ-diversity metric, p-value < 0.0001 from multivariable linear modeling adjusting for age, sex, prior antibiotic use, race, ethnicity, body mass index, Charlson Comorbidity Index, use of remdesivir or corticosteroids, days since admission, SARS-CoV-2 stool viral load, sequencing depth, and a participant-level random effect). Boxes represent median and interquartile range, while whiskers represent 95%ile. c Community-level disturbances in severe vs. moderate COVID-19 as depicted by joint ordination and principal coordinates analysis (PCoA), not fully explained by characteristic trade-offs in Bacteroidetes/Firmicutes or prior antibiotic use. d Ominibus testing of Bray–Curtis distances demonstrates that COVID-19 severity had a modest and statistically significant impact on the overall community structure. Other demographic information, covariates, and hospital course information were not significantly associated (FDR p-value > 0.05)

Gut microbial diversity was significantly reduced in severe COVID-19 after adjusting for factors such as recent antibiotic use (Fig. 1b, p-value < 0.0001). We found that COVID-19 disease severity explained a statistically significant proportion of variance in Bray–Curtis distances (4.04%, FDR p-value = 0.01), while other demographic factors and details related to hospital course had either a modest and/or non-statistically significant effect on overall community structure—this finding was not fully explained by characteristic trade-offs along the Bacteroidetes/Firmicutes axes of variation [39] or prior antibiotic usage (Fig. 1c, d). No major batch effects attributable to sequencing center were observed, and thus, subsequent analyses were conducted on pooled samples (multivariable PERMANOVA R2 for batch = 1.2%, p-value = 0.12, Additional file 1: Fig. S2).

Differential abundance testing

Using multivariable linear mixed-effects modeling accounting for SARS-CoV-2 stool viral load, (which has previously been linked to increased COVID-19-related mortality [40]), age, sex, antibiotic use, race/ethnicity, and other relevant clinical metadata (Methods), we observed statistically significant differences in 48 species-level taxa between severe and moderate COVID-19 (FDR-corrected p-value < 0.05, Fig. 2a and Additional file 1: Table S2). All but two of these taxa (Candida albicans & Enterococcus faecalis) were relatively depleted in severe disease (Fig. 2a, b), a trend concordant with the observed decrease in species richness and evenness. While not directly comparable, the highest absolute β-coefficients from our multivariable modeling for antibiotic use was 3, while 27 of 48 significant taxonomic associations demonstrated coefficients >|3|, suggesting a consistently stronger link between COVID-19 severity and alterations in relative microbial abundance than antimicrobial therapy (Additional file 1: Table S2). We identified significant depletions of Fusicatenibacter saccharivorans and Roseburia hominis (Fig. 2b), consistent with prior work showing the relative contraction of each in patients with post-acute COVID-19 syndrome (PASC), also known as “long COVID” [18, 22]. The abundance of F. saccharivorans and R. hominis were not significantly associated with clinical factors included in our model, including age, sex, and BMI (all FDR p-values > 0.05), though there was a trend towards increased E. faecalis with greater time between hospital admission and sample collection (FDR p-value 0.052; Additional file 1: Table S2).

Fig. 2
figure 2

Taxonomic depletions linked to COVID-19 severity. a Volcano plot of species-level expansions and depletions linked to severe vs. moderate COVID-19. Effect sizes (β-coefficients) from multivariable linear modeling plotted against FDR-corrected p-value. Full results in Table S2. b Highlighted box and scatter plots of taxa abundance by COVID-19 severity. For visualization purposes, technical/true 0s were imputed with a given taxa’s minimum non-zero value. Boxes represent median and interquartile ranges, while whiskers represent 95%ile

Eight taxa were positively associated with stool SARS-CoV-2 viral load, including Methanobrevibacter smithii and Bilophila wadsworthia, as well as several Alistipes spp (Additional file 1: Table S2). Interestingly, an expansion of R. hominis was associated with increased stool viral load (Additional file 1: Table S2). Corresponding to community-wide depletions in microbial diversity, biochemical pathways encoded by gut bacteria were also significantly altered in severe COVID-19, including reductions in amino acid biosynthesis (e.g., glutamine synthesis), isoprenoid biosynthesis, and short-chain fatty acid production (SCFA) pathways, including glycerol degradation, acetyl-CoA fermentation, and methanogenesis from acetate (Additional file 1: Table S3 and Additional file 1: Fig. S3).

To ensure the robustness of our findings, we performed a sensitivity analysis in which we performed our multivariable differential abundance testing on stool collected within 30 days of admission (the median length of stay). We showed that other than an anticipated loss of power from decreased sample size, our findings were not materially altered. Of the 48 differentially abundant species in our primary analysis, 32 remained significant with this more stringent criteria. When similarly restricting our analysis to samples preceding the use of antibiotics (if any), 35 of the 48 differentially abundant species remained statistically significant (Additional file 1: Table S4).

Accurate classification of COVID-19 severity using a microbiome-based random forest learner

Given our findings of both community-wide and feature-level alterations linked to severe COVID-19, we next used a machine learner to predict whether metagenomic features could serve as inputs to classify samples derived from patients with severe vs. moderate COVID-19. To assess whether non-microbial metadata (i.e., participant characteristics) should be jointly considered with microbial taxa in training our classifier, we generated an entropy heatmap to quantify the unique row-wise information with respect to column-wise data (in which non-informative variables would have a value of 0). As all the covariates used in our prior linear modeling (Methods) contributed unique information to label/disease severity prediction (Additional file 1: Fig. S4), each was included in our machine learning workflow.

Using both differentially abundant microbial features and clinical characteristics as our input with five-fold twice-repeated cross-validation (Fig. 3a), our random forest regressor achieved an area under the receiver operating characteristic (AUROC) of 0.925 when tasked with predicting whether stool was obtained from patients with severe vs. moderate COVID-19 (Fig. 3b). Our findings were only modestly attenuated when modeled without clinical metadata (AUROC 0.922) and stool SARS-CoV-2 viral load (AUROC 0.923), respectively. To robustly assess this result, we trained our model using only the top 20 differentially abundant microbial features, which only modestly degraded task performance (AUROC 0.898). Finally, though we ensured samples from the same individual were confined to a single cross-fold, to minimize the possibility of overfitting data from personalized gut microbial communities, we trained and assessed our model using only the first stool sample from each participant, which again performed with excellent accuracy (AUROC 0.871), further supporting the role of metagenomic profiling as a diagnostic biomarker for disease severity.

Fig. 3
figure 3

Stool-based classifier for COVID-19 disease severity. a Box and scatter plots of the top 50 microbial features and their differential abundance by COVID-19 severity with barplots indicating univariate/nominal p-value, fold change by study group, prevalence, and taxa-level contribution to area-under-the curve for a random forest-based machine learner. b Receiver operator characteristic (ROC) and precision-recall curves demonstrating excellent performance in classifying stool samples by COVID-19 severity. The removal of stool SARS-CoV-2 viral load and clinical metadata resulted in only modestly decreased task performance, as did limiting our input to only the top 20 differentially abundant microbes by disease class. A sensitivity analysis using only the first provided stool from each participant, which should minimize the possibility of overfitting data due to repeated measures and longitudinal sampling, still performed well. c External validation of the taxa-only random forest model on an independent dataset of 24 patients with mild/moderate COVID-19 and 14 with severe/critical COVID-19 (Xu et al. 2022)

A comprehensive literature review identified one metagenomic cohort with publicly available information on COVID-19 disease severity [19]. After uniform pre-processing of raw sequences (Methods), we tested our model on this external dataset of 38 patients with mild/moderate vs. severe/critical COVID-19. Despite heterogeneity in case definition, collection methods, country of origin, and the lack of additional clinical metadata beyond disease severity, testing our taxa-only classifier achieved an AUROC of 0.741 on this external dataset, independently verifying a strong association between COVID-19 disease severity and alterations in gut microbial communities (Fig. 3c).

Systems approaches to interrogate microbial assemblages

To explore the possible biological mechanisms underlying our observations, we next sought to compare microbial co-occurrence networks in moderate vs. severe COVID-19 disease (Methods). We hypothesized that the community-wide and feature-level alterations observed in moderate vs. severe COVID-19 would change microbial network topology. First, we evaluated global microbial network properties. The adjusted Rand Index (ARI) is a measure of similarities in clustering, quantifying the likelihood that pairs of microbial species would be assigned to the same cluster in both networks. An ARI value of 0 indicates random clustering across comparator groups, a value of 1 indicates identical clustering, and a value of -1 indicates perfect disagreement [41, 42]. When comparing moderate to severe COVID-19, the ARI was 0.199 (p-value < 0.001), a modest but statistically significant finding indicating somewhat similar clustering of microbial species between networks. Jaccard’s index (JI) evaluates differences among central nodes between our two severity-specific networks, where a value of 0 indicates completely different sets of central nodes and a value of 1 indicates identical central nodes [43]. While there were no statistically significant differences in overall centrality measures when comparing moderate to severe cases, there were alterations in proportion of positive edges network-wide (92.9% vs 100%, p-value < 0.001), indicating a loss of moderate negative correlations in severe COVID-19. For example, C. albicans, which was relatively more abundant in severe compared to moderate COVID-19, has 0 vs. 3 negative edges in each disease state, respectively, raising the possibility that the loss of negative selective pressure can promote the growth of certain microbial species in severe COVID-19.

We identified 16 taxa as network hubs, i.e., species with high putative importance given their centrality to the surrounding microbial networks (Fig. 4, Additional file 1: Fig. S5, and Additional file 1: Table S5). Five species were identified as hubs in both moderate and severe disease (Blautia wexlerae, Eubacterium hallii, Gordonibacter pamelaeae, Odoribacter splanchnicus, and Alistipes shahii), while 11 were unique to one network or the other (Fig. 4, Additional file 1: Table S5 and Additional file 1: Fig. S5). Critically, 9 of these 16 identified hubs, including Blautia wexlerae and Eubacterium hallii, were shown to be differentially abundant by disease severity (Fisher’s exact p-value = 0.03, Additional file 1: Table S2), and the relative abundance of two hubs, Eubacterium rectale and Alistipes putredenis, were associated with stool viral load. We further observed that highly connected clusters in moderate disease become fragmented in severe COVID-19, as evidenced by an increase in singletons (χ2 p-value < 0.001). We also observed a decrease in the number of hub taxa and dynamic taxa-level cluster reassignment (Fig. 4). Notably, all but one of the hubs shown to be differentially abundant by disease severity belonged to the same cluster, suggesting that significant loss of these central taxa in severe disease may contribute to the observed network instability.

Fig. 4
figure 4

Comparative microbial assemblages in moderate vs. severe COVID-19. We assembled discrete microbial networks for moderate vs. severe disease to demonstrate significant ecological heterogeneity characterized by fractured clustering and taxa-level reassignment in severe disease. Species are represented by circles (nodes) and species-species correlations were weighted by strength of correlation (edges drawn if absolute Pearson’s ⍴ > 0.4). Node size indicates normalized relative abundance, and node colors indicate cluster membership. Cluster colors are retained across networks if two or more taxa are shared. Edge color reflects the direction of correlation, with red edges indicating a negative, and green edges indicating a positive correlation, respectively. Node hubs have been labeled, while clusters are referred to by their nominate node, or the taxa with the highest edge count in a given cluster by network. Node positions are fixed between networks (full node maps and labels found in Fig S4 and Table S5)

Predicted stool metabolites linked to disease severity

We next sought to evaluate whether changes in microbial communities affected capacity for local metabolite production. Using a validated computational workflow to generate putative metabolic profiles from stool metagenomes [33] (Methods), we found 57 of 80 well-predicted known stool metabolites to be differentially perturbed based on COVID-19 disease severity (all FDR-corrected p-value < 0.05; Fig. 5a and Additional file 1: Table S6). We identified disrupted bile acid metabolism in severe COVID-19, with relative enrichment of primary bile acids (chenodeoxycholate, cholate, and ketodeoxycholate) alongside depletion of secondary bile acids (lithocholate, lithocholic acid, and deoxycholic acid) (Fig. 5b). Similar to our microbial pathway analysis which revealed reductions in MetaCyc pathways related to SCFA production, predicted levels of butyrate, isobutyrate, and propionate were also reduced in severe COVID-19 (Additional file 1: Table S6). Furthermore, we confirmed prior data showing relative enrichment of bilirubin [44], creatine and polyamines (e.g., acetyl-spermidine [45]), and pantothenic acid [46] in severe COVID-19, as well as a relative depletion of deoxyinosine [46] (Additional file 1: Table S6).

Fig. 5
figure 5

Predicted stool metabolite profiles. a Volcano plot of enrichments and depletions in predicted stool metabolites linked to severe compared to moderate COVID-19. Adjusted log2fold change calculated from β-coefficients extracted from multivariable linear modeling plotted against FDR-corrected p-value. Full results in Table S6. b Highlighted box and scatter plots of predicted metabolite abundance by COVID-19 severity. For visualization purposes, technical/true 0 s were imputed with a given taxa’s minimum non-zero value prior to log-transformation. Boxes represent medians and interquartile ranges, while whiskers represent 95%ile


In a comparatively large US hospital-based cohort of diverse patients admitted with confirmed COVID-19 during the initial year of the pandemic, we found community- and species-level alterations linked to disease severity. Using a random forest machine learner, these microbial features could accurately classify patients based on disease severity, indicating that specific gut microbial configurations may be linked to a more severe disease course, a finding we validated in a separate independent cohort. Network analyses identified significant disruptions to gut ecologic topology in severe COVID-19. Differential abundance testing of microbial pathways and separate predicted stool metabolite-based analyses suggest that these disruptions may change the balance of bile acids and SCFAs in the gut, identifying novel treatment opportunities that may ameliorate the severity of COVID-19. We also found significant depletions of two microbes previously associated with long COVID, suggesting early gut microbial disturbances may precede the development of a long-term complication.

Determining who will require a higher level of care remains one of the most challenging questions facing clinicians caring for patients with COVID-19. Our machine learning algorithm demonstrated excellent discrimination between moderate and severe COVID-19 using only gut microbial features. Notably, the inclusion of clinical data did not significantly improve the classification accuracy of our model. Prior work has incorporated such information from the initial presentation [47], multi-cytokine panels [48], and previously validated illness severity scores [49] to forecast whether a given patient will suffer from a more severe COVID-19 course. However, based on their performance characteristics, these approaches appear to be less accurate than our microbiome-centered approach.

Our findings expand on prior research linking changes in gut microbial ecology to COVID-19. However, it should be noted that much of the initial work has been done on a smaller scale [7, 9,10,11, 14] and typically outside of North America [7,8,9,10,11,12,13,14,15], limiting their generalizability. Further, these comparative analyses may have focused on specialized populations, such as the very young, the asymptomatic, or patients in recovery [12, 16,17,18], and may not have been well-suited to consider clinical factors that may confound the relationship between gut microbial communities and COVID-19 using more robust multivariable approaches [7, 8, 10,11,12,13,14,15,16,17]. Prior studies also predominantly relied on 16S rRNA sequencing to demonstrate community- or genus-level shifts related to COVID-19 [7, 14,15,16,17], falling short of the species-level resolution and biochemical insights gained by employing next-generation sequencing of gut metagenomes and other functional multi-omic technologies. In contrast, we assembled a large, representative North American patient population admitted with symptomatic COVID-19 whose gut microbial communities were interrogated using metagenomic techniques, allowing us to identify novel microbial features to more comprehensively characterize disease severity with high predictive accuracy.

Prior investigations have observed similar community- and taxa-level alterations in microbial composition in COVID-19. In the earliest phase of the pandemic, a study from Hong Kong (n = 36) also demonstrated relative reductions in the group Eubacterium among the gut metagenomes of COVID-19-infected patients compared to referent populations, and like our work, found widespread depletion of typical gut colonizers such as Faecalibacterium and Roseburia spp. in severe COVID-19 [9]. In an expanded population of 100 patients, the same group reaffirmed a reduction in diversity and a loss of health-associated gut commensals in severe COVID-19 [13]. Finally, a study of 30 SARS-CoV-2 infected patients in mainland China using 16S rRNA-based sequencing similarly demonstrated a change in gut community structure with reductions in ɑ-diversity compared to referent counterparts [14]. Notably, they also achieved success in classifying stool samples from patients with COVID-19 compared to those from healthy controls or those infected with influenza, indicating the relatively distinct gut ecology of COVID-19. However, their classification tasks were conducted in a smaller population using supervised feature selection (i.e., the top results from their linear discriminant analysis) of genus-level taxa, and arguably, the role of a gut microbial biomarker in discriminating COVID-19 from non-infected individuals is uncertain now that SARS-CoV-2 testing is more widely available [50].

Our work offers insights beyond these broad characterizations of the gut microbiome in COVID-19. Among the eight taxa that were positively associated with stool SARS-CoV-2 viral load, several contribute to pro-inflammatory sulfur metabolism, such as Methanobrevibacter smithii and Bilophila wadsworthia [51,52,53]. Our finding of enriched R. hominis with increased stool viral load despite a corresponding decrease among patients with severe COVID-19 may suggest an interaction between stool SARS-CoV-2 viral load, R. hominis, and severe COVID-19. It is appreciated that gut microbial ecology influences the host immune response to viral respiratory infections [3,4,5,6]. Our identification of Blautia wexlerae and Eubacterium hallii as network hubs depleted in severe COVID-19 (both Lachnospiraceae implicated in other immune-mediated diseases [54]) suggests these bacteria may engage in important roles in the regulation of immunity to SARS-CoV-2. Predicted alteration of secondary bile acid metabolism in severe disease provides another mechanism by which changes in gut microbial communities may influence the immune response to SARS-CoV-2. Bile acids regulate mucosal and systemic immunity in several ways [55]. Prior work has suggested that secondary bile acids are the primary ligand for TGR5 [56] through which they may suppress pro-inflammatory signaling [55, 57], resulting in impaired immunity to viral infections [58, 59]. The predicted shift in bile acid pools may also result in increased regulation of bile acid-sensitive transcription factors, as increased primary bile acids will preferentially activate farsenoid X Receptor, while depletions in secondary bile acids will reduce activation of vitamin D receptor (VDR) [60, 61] and pregnane X receptor (PXR) [62]. Decreased VDR/PXR signaling during active infection is associated with increased systemic inflammation and increased morbidity and mortality [63, 64], possibly contributing to the clinical milieu observed in severe COVID-19. This is a particularly noteworthy hypothesis given emerging epidemiologic data on the link between diet [65], vitamin D status [66], and COVID-19 disease risk and severity, as well as early work linking depletion of secondary bile acids to COVID-19-related mortality [67].

Our study has several key strengths. First, we assembled a large representative cohort of patients at a U.S.-based tertiary care center for whom we collected relevant clinical metadata to complement serial stool sampling. Second, our computational workflow allowed us to not only link community-level changes in gut microbial ecology but species-resolved signatures of severe COVID-19, which we were able to validate in an external cohort of patients. Third, complementary MetaCyc pathway and predicted metabolite analyses further link these changes to alterations in bile acid pool and SCFA levels. Taken together, these observations serve as proof of principle that using NGS to interrogate gut microbial ecology may generate tractable hypotheses to be explored in follow-up investigations. Finally, our results fit well in the context of independent works from other groups—ending credence to our findings—and using a machine learning classifier, we demonstrate excellent accuracy in discriminating samples from moderate vs. severe COVID-19. These findings hint at the possibility that modulating gut microbial communities may be a viable disease prevention or therapeutic strategy in COVID-19.

We acknowledge several limitations. We were not positioned to assess whether findings differed on the basis of SARS-CoV-2 strain or variants. Our study enrolled patients from April 2020 to May 2021 during which genomic surveillance infrastructure in the USA was not equipped to comprehensively explore this question. Prior to the Delta variant wave beginning in June 2021, the majority of COVID-19 cases were either Alpha or other less consequential variants of interest [68]. As our study enrolled hospitalized patients with moderate COVID-19 to minimize differences between those hospitalized with severe disease, we are not positioned to explore what differences—if any—may exist between each group and their non-SARS-CoV-2 infected counterparts and the degree to which between-group differences are attributable solely to critical illness or prolonged hospitalization. However, we either adjusted for participant factors that differed between groups in our multivariable modeling (e.g., BMI, comorbid disease, hospital length of stay, and antiviral therapy) or were limited by the fact that distinguishing features such as advanced oxygen delivery and other ICU-level interventions were, by definition, established markers of severe COVID-19. Several sensitivity analyses including restricting our cohort to those unexposed to antibiotics, using just the first stool sample provided, or those provided prior to the median length of stay were each consistent with our main findings. Since most patients with severe disease were admitted to the ICU shortly after presentation, we were unable to prospectively collect a substantial number of pre-ICU samples in these patients, limiting our ability to classify or predict the development of severe COVID-19. Given the observational nature of our study, we cannot exclude the possibility of residual confounding. However, we adjusted for multiple potential confounders. All enrolled patients were hospitalized, which may minimize study heterogeneity at the expense of overall generalizability. We also assessed the gut microbiome at the earliest feasible time point on admission. This resulted in variation in the timing of collection, which limits our ability to infer causality. Absolute microbial abundance measurements could not be obtained. Finally, our collection protocol did not allow for the measurement of stool metabolites to validate our computational approach, without which it may be more accurate to consider these results as suggestive of altered capacity for metabolite class production rather than actual differences in quantifiable metabolite pools. Relatedly, diet and other unmeasured determinants of stool metabolite production are unlikely to be stable in a hospitalized population. Despite these limitations, our findings are intended to be hypothesis-generating to inform the continuum of research that may logically follow.


Leveraging the gut microbiome as a potential biomarker for disease severity and modulating this fragile ecology to improve COVID-19 outcomes each hold significant appeal in the fight to end this pandemic. Multidisciplinary approaches will be needed to confirm our early findings. Prospective validation of a non-invasive indicator predictive of disease severity could readily identify and target at-risk individuals for more aggressive therapy.

Availability of data and materials

Raw sequencing reads are available from National Center for Biotechnology Information’s Sequence Read Archive under BioProject ID: PRJNA976404. ( [69].


  1. COVID-19 map. Johns Hopkins Coronavirus Resource Center. Cited 2023 Apr 22. Available from:

  2. Lynch SV, Pedersen O. The Human Intestinal Microbiome in Health and Disease. N Engl J Med. 2016;375:2369–79.

    Article  CAS  PubMed  Google Scholar 

  3. Keely S, Talley NJ, Hansbro PM. Pulmonary-intestinal cross-talk in mucosal inflammatory disease. Mucosal Immunol. 2012;5:7–18.

    Article  CAS  PubMed  Google Scholar 

  4. Bordon Y. Antibiotics can impede flu vaccines. Nat. Rev. Immunol. 2019. 663.

  5. Hagan T, Cortese M, Rouphael N, Boudreau C, Linde C, Maddur MS, et al. Antibiotics-Driven Gut Microbiome Perturbation Alters Immunity to Vaccines in Humans. Cell. 2019;178:1313-1328.e13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Chen C-J, Wu G-H, Kuo R-L, Shih S-R. Role of the intestinal microbiota in the immunomodulation of influenza virus infection. Microbes Infect. 2017;19:570–9.

    Article  CAS  PubMed  Google Scholar 

  7. Ren Z, Wang H, Cui G, Lu H, Wang L, Luo H, et al. Alterations in the human oral and gut microbiomes and lipidomics in COVID-19. Gut. 2021;70:1253–65.

    Article  CAS  PubMed  Google Scholar 

  8. Zhang F, Wan Y, Zuo T, Yeoh YK, Liu Q, Zhang L, et al. Prolonged Impairment of Short-Chain Fatty Acid and L-Isoleucine Biosynthesis in Gut Microbiome in Patients With COVID-19. Gastroenterology. 2022;162:548-561.e4 Available from: .

    Article  CAS  PubMed  Google Scholar 

  9. Zuo T, Zhang F, Lui GCY, Yeoh YK, Li AYL, Zhan H, et al. Alterations in Gut Microbiota of Patients With COVID-19 During Time of Hospitalization. Gastroenterology. 2020;159:944-955.e8.

    Article  CAS  PubMed  Google Scholar 

  10. Zuo T, Liu Q, Zhang F, Lui GC-Y, Tso EY, Yeoh YK, et al. Depicting SARS-CoV-2 faecal viral activity in association with gut microbiota composition in patients with COVID-19. Gut. 2021;70:276–84.

    Article  CAS  PubMed  Google Scholar 

  11. Zuo T, Zhan H, Zhang F, Liu Q, Tso EYK, Lui GCY, et al. Alterations in Fecal Fungal Microbiome of Patients With COVID-19 During Time of Hospitalization until Discharge. Gastroenterology. 2020;159:1302-1310.e5.

    Article  CAS  PubMed  Google Scholar 

  12. Ng SC, Peng Y, Zhang L, Mok CK, Zhao S, Li A, et al. Gut microbiota composition is associated with SARS-CoV-2 vaccine immunogenicity and adverse events. Gut. 2022;

  13. Yeoh YK, Zuo T, Lui GC-Y, Zhang F, Liu Q, Li AY, et al. Gut microbiota composition reflects disease severity and dysfunctional immune responses in patients with COVID-19. Gut. 2021;70:698–706.

    Article  CAS  PubMed  Google Scholar 

  14. Gu S, Chen Y, Wu Z, Chen Y, Gao H, Lv L, et al. Alterations of the Gut Microbiota in Patients With Coronavirus Disease 2019 or H1N1 Influenza. Clin Infect Dis. 2020;71:2669–78.

    Article  CAS  PubMed  Google Scholar 

  15. Schult D, Reitmeier S, Koyumdzhieva P, Lahmer T, Middelhof M, Erber J, et al. Gut bacterial dysbiosis and instability is associated with the onset of complications and mortality in COVID-19. Gut Microbes. 2022;14:2031840.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Nashed L, Mani J, Hazrati S, Stern DB, Subramanian P, Mattei L, et al. Gut microbiota changes are detected in asymptomatic very young children with SARS-CoV-2 infection. Gut. 2022; Available from:

  17. Newsome RC, Gauthier J, Hernandez MC, Abraham GE, Robinson TO, Williams HB, et al. The gut microbiome of COVID-19 recovered patients returns to uninfected status in a minority-dominated United States cohort. Gut Microbes. 2021;13:1–15.

    Article  CAS  PubMed  Google Scholar 

  18. Liu Q, Mak JWY, Su Q, Yeoh YK, Lui GC-Y, Ng SSS, et al. Gut microbiota dynamics in a prospective cohort of patients with post-acute COVID-19 syndrome. Gut. 2022;71:544–52.

    Article  CAS  PubMed  Google Scholar 

  19. Xu X, Zhang W, Guo M, Xiao C, Fu Z, Yu S, et al. Integrated analysis of gut microbiome and host immune responses in COVID-19. Front Med. 2022;16:263–75. (Springer Science and Business Media LLC).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Reinold J, Farahpour F, Fehring C, Dolff S, Konik M, Korth J, et al. A pro-inflammatory gut microbiome characterizes SARS-CoV-2 infected patients and a reduction in the connectivity of an anti-inflammatory bacterial network associates with severe COVID-19. Front Cell Infect Microbiol. 2021;11:747816. (Frontiers Media SA).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Nonhospitalized adults: Therapeutic management. COVID-19 Treatment Guidelines. Cited 2022 Apr 5. Available from:

  22. Zhou Y, Zhang J, Zhang D, Ma W-L, Wang X. Linking the gut microbiota to persistent symptoms in survivors of COVID-19 after discharge. J Microbiol. 2021;59:941–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Berlin DA, Gulick RM, Martinez FJ. Severe Covid-19. N Engl J Med. 2020;383:2451–60.

    Article  CAS  PubMed  Google Scholar 

  24. Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J Clin Epidemiol. 2004;57:1288–94.

    Article  PubMed  Google Scholar 

  25. Gall J-RL, Le Gall J-R. A New Simplified Acute Physiology Score (SAPS II) Based on a European/North American Multicenter Study. JAMA. 1993. 2957.

  26. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996;22:707–10.

    Article  CAS  PubMed  Google Scholar 

  27. Sui H-Y, Weil AA, Nuwagira E, Qadri F, Ryan ET, Mezzari MP, et al. Impact of DNA Extraction Method on Variation in Human and Built Environment Microbial Community and Functional Profiles Assessed by Shotgun Metagenomics Sequencing. Front Microbiol. 2020;11:953.

    Article  PubMed  PubMed Central  Google Scholar 

  28. CDC. Labs. Centers for Disease Control and Prevention. Cited 2020 Aug 20. Available from:

  29. McIver LJ, Abu-Ali G, Franzosa EA, Schwager R, Morgan XC, Waldron L, et al. bioBakery: a meta’omic analysis environment. Bioinformatics. 2018;34:1235–7.

    Article  CAS  PubMed  Google Scholar 

  30. Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife. 2021;10.

  31. Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2014;42:D459-71.

    Article  CAS  PubMed  Google Scholar 

  32. Chong J, Xia J. Computational Approaches for Integrative Analysis of the Metabolome and Microbiome. Metabolites. 2017;7.

  33. Mallick H, Franzosa EA, Mclver LJ, Banerjee S, Sirota-Madi A, Kostic AD, et al. Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat Commun. 2019;10:3136.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Package “vegan”. 2022 Cited 2023 May 26. Available from:

  35. Wirbel J, Zych K, Essex M, Karcher N, Kartal E, Salazar G, et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. Springer Science and Business Media LLC; 2021;22:93. Available from:

  36. Peschel S, Müller CL, von Mutius E, Boulesteix A-L, Depner M. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform. 2021;22.

  37. Beck DH, Smith GB, Pappachan JV, Millar B. External validation of the SAPS II, APACHE II and APACHE III prognostic models in South England: a multicentre study. Intensive Care Med. 2003;29:249–56.

    Article  PubMed  Google Scholar 

  38. Ferreira FL, Bota DP, Bross A, Mélot C, Vincent JL. Serial evaluation of the SOFA score to predict outcome in critically ill patients. JAMA. 2001;286:1754–8.

    Article  CAS  PubMed  Google Scholar 

  39. Vieira-Silva S, Falony G, Darzi Y, Lima-Mendez G, Garcia Yunta R, Okuda S, et al. Species-function relationships shape ecological properties of the human gut microbiome. Nat Microbiol. 2016;1:16088.

    Article  CAS  PubMed  Google Scholar 

  40. Das Adhikari U, Eng G, Farcasanu M, Avena LE, Choudhary MC, Triant VA, et al. Fecal severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) RNA is associated with decreased Coronavirus disease 2019 (COVID-19) survival. Clin Infect Dis. 2022;74:1081–4. (Oxford University Press (OUP)).

    Article  CAS  PubMed  Google Scholar 

  41. Rand WM. Objective Criteria for the Evaluation of Clustering Methods. J Am Stat Assoc. 1971;66:846–50 Taylor & Francis. Available from: .

    Article  Google Scholar 

  42. Qannari EM, Courcoux P, Faye P. Significance test of the adjusted Rand index. Application to the free sorting task. Food Qual Prefer. 2014;32:93–7 Available from: .

    Article  Google Scholar 

  43. Real R, Vargas JM. The Probabilistic Basis of Jaccard’s Index of Similarity. Syst Biol. 1996;45:380–5 Oxford University Press, Society of Systematic Biologists. Available from: .

    Article  Google Scholar 

  44. Shen B, Yi X, Sun Y, Bi X, Du J, Zhang C, et al. Proteomic and Metabolomic Characterization of COVID-19 Patient Sera. Cell. 2020;182:59-72.e15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Thomas T, Stefanoni D, Reisz JA, Nemkov T, Bertolone L, Francis RO, et al. COVID-19 infection alters kynurenine and fatty acid metabolism, correlating with IL-6 levels and renal status. JCI Insight. 2020;5.

  46. Lv L, Jiang H, Chen Y, Gu S, Xia J, Zhang H, et al. The faecal metabolome in COVID-19 patients is altered and associated with clinical features and gut microbes. Anal Chim Acta. 2021;1152:338267.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Gallo Marin B, Aghagoli G, Lavine K, Yang L, Siff EJ, Chiang SS, et al. Predictors of COVID-19 severity: A literature review. Rev Med Virol. 2021;31:1–10.

    Article  CAS  PubMed  Google Scholar 

  48. Cabaro S, D’Esposito V, Di Matola T, Sale S, Cennamo M, Terracciano D, et al. Cytokine signature and COVID-19 prediction models in the two waves of pandemics. Sci Rep. 2021;11:20793.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Raschke RA, Agarwal S, Rangan P, Heise CW, Curry SC. Discriminant Accuracy of the SOFA Score for Determining the Probable Mortality of Patients With COVID-19 Pneumonia Requiring Mechanical Ventilation. JAMA. 2021;325:1469–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Peeling RW, Heymann DL, Teo Y-Y, Garcia PJ. Diagnostics for COVID-19: moving from pandemic response to control. Lancet. 2022;399:757–68.

    Article  CAS  PubMed  Google Scholar 

  51. Nguyen LH, Ma W, Wang DD, Cao Y, Mallick H, Gerbaba TK, et al. Association Between Sulfur-Metabolizing Bacterial Communities in Stool and Risk of Distal Colorectal Cancer in Men. Gastroenterology. 2020;158:1313–25.

    Article  CAS  PubMed  Google Scholar 

  52. Wang Y, Nguyen LH, Mehta RS, Song M, Huttenhower C, Chan AT. Association between the sulfur microbial diet and risk of colorectal cancer. JAMA Netw Open. 2021;4:e2134308. (American Medical Association (AMA)).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Nguyen LH, Cao Y, Hur J, Mehta RS, Sikavi DR, Wang Y, et al. The sulfur microbial diet is associated with increased risk of early-onset colorectal cancer precursors. Gastroenterology. 2021;161:1423-1432.e4. (Elsevier BV).

    Article  CAS  PubMed  Google Scholar 

  54. Vacca M, Celano G, Calabrese FM, Portincasa P, Gobbetti M, De Angelis M. The Controversial Role of Human Gut Lachnospiraceae. Microorganisms. 2020. 573.

  55. Chen ML, Takeda K, Sundrud MS. Emerging roles of bile acids in mucosal immunity and inflammation. Mucosal Immunol. 2019;12:851–61.

    Article  CAS  PubMed  Google Scholar 

  56. Kawamata Y, Fujii R, Hosoya M, Harada M, Yoshida H, Miwa M, et al. A G Protein-coupled Receptor Responsive to Bile Acids *. J Biol Chem. 2003;278:9435–40 Elsevier. Cited 2022 Apr 7. Available from: .

    Article  CAS  PubMed  Google Scholar 

  57. Hao H, Cao L, Jiang C, Che Y, Zhang S, Takahashi S, et al. Farnesoid X Receptor Regulation of the NLRP3 Inflammasome Underlies Cholestasis-Associated Sepsis. Cell Metab. 2017;25:856-867.e5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Ichinohe T, Pang IK, Kumamoto Y, Peaper DR, Ho JH, Murray TS, et al. Microbiota regulates immune defense against respiratory tract influenza A virus infection. Proc Natl Acad Sci U S A. 2011;108:5354–9.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Stefan KL, Kim MV, Iwasaki A, Kasper DL. Commensal Microbiota Modulation of Natural Resistance to Virus Infection. Cell. 2020;183:1312-1324.e10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Li T, Chiang JYL. Nuclear receptors in bile acid metabolism. Drug Metab Rev. 2013;45:145–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Makishima M, Lu TT, Xie W, Whitfield GK, Domoto H, Evans RM, et al. Vitamin D receptor as an intestinal bile acid sensor. Science. 2002;296:1313–6.

    Article  CAS  PubMed  Google Scholar 

  62. Staudinger JL, Goodwin B, Jones SA, Hawkins-Brown D, MacKenzie KI, LaTour A, et al. The nuclear receptor PXR is a lithocholic acid sensor that protects against liver toxicity. Proc Natl Acad Sci U S A. 2001;98:3369–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Qiu Z, Cervantes JL, Cicek BB, Mukherjee S, Venkatesh M, Maher LA, et al. Pregnane X Receptor Regulates Pathogen-Induced Inflammation and Host Defense against an Intracellular Bacterial Infection through Toll-like Receptor 4. Sci Rep. 2016;6:31936.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Kongsbak M, Levring TB, Geisler C, von Essen MR. The vitamin d receptor and T cell function. Front Immunol. 2013;4:148.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Merino J, Joshi AD, Nguyen LH, Leeming ER, Mazidi M, Drew DA, et al. Diet quality and risk and severity of COVID-19: a prospective cohort study. Gut. 2021;70:2096–104.

    Article  CAS  PubMed  Google Scholar 

  66. Ma W, Nguyen LH, Yue Y, Ding M, Drew DA, Wang K, et al. Associations between predicted vitamin D status, vitamin D intake, and risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and coronavirus disease 2019 (COVID-19) severity. Am J Clin Nutr. 2022;115:1123–33.

    Article  PubMed  Google Scholar 

  67. Stutz MR, Dylla NP, Pearson SD, et al. Immunomodulatory fecal metabolites are associated with mortality in COVID-19 patients with respiratory failure. Nat Commun. 2022;13:6615.

  68. Corum J, Zimmer C. Tracking Omicron and Other Coronavirus Variants. The New York Times. The New York Times; 2021; Cited 2022 Mar 16. Available from:

  69. Nguyen LH, Okin D, Drew DA, Battista VM, Jesudasen S, Kuntz TM, et al. Metagenomic assessment of gut microbial communities and risk of severe COVID-19. PRJNA976404, NCBI Sequence Read Archive. Cited 2023 May 26. Available from:

Download references


Computational work was conducted on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University. We thank the MGH Translational and Clinical Research Center (TCRC) for their support of the project. The TCRC is supported by Grant Number 1UL1TR002541.


This work was supported in part by the American Gastroenterological Association Research Foundation’s AGA-Takeda COVID-19 Rapid Response Research Award 2021–5102 (L.H.N. and D.A.D.) and Research Scholars Award (L.H.N.), the Massachusetts Consortium on Pathogen Readiness (MassCPR), the American Lung Association COVID-19 and Emerging Respiratory Viruses Research Award COVID-923084 (P.S.L.), Mark and Lisa Schwartz (A.T.C), the Crohn’s and Colitis Foundation Career Development Award and Research Fellowship Award (L.H.N.), and the NIH/ NIDDK K23DK125838 (L.H.N.), K01DK120742 (D.A.D.), T32HL116275 (D.A.O.). Study sponsors had no role in the study design, data collection, analysis, and interpretation of data. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center for Research Resources, the National Center for Advancing Translational Science, or the National Institutes of Health.

Author information

Authors and Affiliations



Drs. Nguyen, Okin, Drew, Chan, and Lai had full access to all study data and are responsible for data integrity and accuracy of the data analysis. Study concept and design: LHN, DO, DAD, ATC, PSL. Acquisition of data: all co-authors. Analysis and interpretation of data: all co-authors. Drafting of the manuscript: LHN, DO, DAD. Critical revision of the manuscript for important intellectual content: all authors. Statistical analysis: LHN, VMB. Obtained funding: LHN, DAD, ATC, PSL. Study supervision: ATC, PSL. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Andrew T. Chan or Peggy S. Lai.

Ethics declarations

Ethics approval and consent to participate

Study protocol #2020P000804 was approved by the Mass General Brigham Institutional Review Board. All participants or their healthcare proxy provided written informed consent to participate. The enclosed research conformed to the principles of the Helsinki Declaration.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

 Table S1. Participant characteristics. Table S2. Multivariable linear modeling results. Table S3. Multivariable linear modeling results. Table S4. Sensitivity analyses for differentially abundant species. Table S5. Node and network-specific information. Table S6. Multivariable linear modeling results. Figure S1. Study enrollment diagram. Figure S2. PCoA by batch. Figure S3. Volcano plot for multivariable linear modeling results. Figure S4. Entropy heatmap for clinical covariates. Figure S5. Node map.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, L.H., Okin, D., Drew, D.A. et al. Metagenomic assessment of gut microbial communities and risk of severe COVID-19. Genome Med 15, 49 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • SARS-CoV-2
  • Microbiome
  • Machine learning