Skip to main content

Table 1 Tools for strain identification in community amplicon and shotgun metagenomic sequencing. Methods and brief summaries of their algorithms for detecting and quantifying strains (by various definitions) from 16S rRNA gene amplicon or shotgun metagenomic sequencing. These are currently the two most prevalent assays for culture-independent strain detection within microbial communities. Note that we have excluded other experimental protocols from this summary, including single-cell, long-read, and synthetic long-read sequencing, since they generally require more than application of a specific software pipeline. These alternatives, and non-sequencing-based approaches, are described in more detail in the text

From: Strain-level epidemiology of microbial communities and the human microbiome

Method Platform Authors’ description Reference
Oligotyping 16S rRNA gene amplicon “oligotyping... Focus [es] on the variable sites revealed by the entropy analysis to identify highly refined taxonomic units” [112]
Sub-OTU clustering 16S rRNA gene amplicon “we combine error-model-based denoising and systematic cross-sample comparisons to resolve the fine (sub-OTU) structure of moderate-to-high-abundance community members” [113]
MED 16S rRNA gene amplicon “MED uses information uncertainty among sequence reads to iteratively decompose a dataset until the maximum entropy criterion is satisfied for each final unit” [114]
DADA2 16S rRNA gene amplicon “DADA2 implements a new quality-aware model of Illumina amplicon errors. Sample composition is inferred by dividing amplicon reads into partitions consistent with the error model.” [115]
Deblur 16S rRNA gene amplicon “Deblur … compares sequence-to-sequence Hamming distances within a sample to an upper-bound error profile combined with a greedy algorithm to obtain single-nucleotide resolution.” [116]
UNOISE2 16S rRNA gene amplicon “UNOISE2... Cluster [s] the unique sequences in the reads. A cluster has a centroid sequence with higher abundance plus similar sequences having lower abundances.” [117]
PathoScope Shotgun metagenomic “PathoID … reassign [s] ambiguously aligned sequencing reads and accurately estimate [s] read proportions from each genome in the sample.” [118]
LSA Shotgun metagenomic “LSA... separates reads into biologically informed partitions and thereby enables assembly of individual genomes.” [119]
PanPhlAn Shotgun metagenomic “PanPhlAn identifies which genes are present or absent within different strains of a species, based on the entire gene set of the species’ pangenome.” [66]
MetaMLST Shotgun metagenomic “MetaMLST performs an in silico consensus sequence reconstruction of the allelic profile of the microbial strains in a metagenomics sample.” [120]
MIDAS Shotgun metagenomic “MIDAS … is a computational pipeline that quantifies bacterial species abundance and intra-species genomic variation from shotgun metagenomes.” [37]
ConStrains Shotgun metagenomic “ConStrains … exploits the polymorphism patterns in a set of universal bacterial and archaeal genes to infer strain-level structures in species populations.” [121]
StrainPhlAn Shotgun metagenomic “StrainPhlAn … is based on reconstructing consensus sequence variants within species-specific marker genes and using them to estimate strain-level phylogenies.” [8]
metaSNV Shotgun metagenomic “metaSNV … performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species” [102]
DESMAN Shotgun metagenomic “DESMAN identifies variants in core genes and uses co-occurrence across samples to link variants into haplotypes and abundance profiles.” [122]