Skip to main content

Table 1 Features of some publicly available short tandem repeat analysis algorithms

From: Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions

Features lobSTR RepeatSeq HipSTR TRED EH STRetch exSTRa GangSTR
Outputs repeat length? Y Y Y Y Y Y N Y
Sequencing reads Single- and paired-end Single- and paired-end Single- and paired-end Paired-end Paired-end Paired-end Paired-end Paired-end
Sequencing platforms supported Illumina, Sanger, 454, and IonTorrent Illumina Illumina Illumina Illumina Illumina Illumina Illumina
Library prep. supported PCR and PCR-free n.a. PCR and PCR-free PCR and PCR-free PCR and PCR-free PCR and PCR-free PCR and PCR-free PCR and PCR-free
Library prep. (rcmd) None None None None PCR-free PCR-free None None
Aligners (rcmd) lobSTR and BWA-MEM Novoalign and Bowtie 2 Indel-sensitive aligner None None None Bowtie 2 None
Analysis approach Targeted and GW Targeted and GW Targeted and GW Targeted Targeted GW Targeted and GW Targeted and GW
NGS data type supported WGS WGS WGS WGS WGS and ES WGS and ES WGS and ES WGS and ES
NGS data format .bam, .fastq, or .fasta .bam .bam .bam .bam or .cram .bam or .fastq .bam .bam
Built-in stutter correction modela Y Y Y Y n.a. n.a. n.a. Y
Test of significance N N N N N Y Y N
Read types used Spanning Spanning Spanning Spanning, flanking or partial, paired-end reads, and IRR Spanning, flanking, and IRR/IRR pairs Anchored IRR Flanking and anchored IRR Spanning, flanking, and IRR/IRR pairs
Phasingb n.a n.a Y n.a n.a n.a n.a n.a
PL C++ C++ C++ Python C++ Java Perl and R C++
Sizing limitation RL RL RL FL Not limited FL n.a. Not limited
Control dataset Not required Not required Not required Not required Not required Required Not required Not required
Complex repeats n.a. n.a. n.a. n.a. Y n.a. n.a. N
Output files .vcf and .allelotype.stats .repeatseq, .calls, and .vcf .vcf .vcf and .json .vcf, .json, and .log .tsv p-values, ECDF, and tsum plots .vcf
Customized regions file Possible Possible Possible Possible Possible Possible, but not rcmd Possible Possible
  1. EH ExpansionHunter, TRED TREDPARSE, Y feature included, N feature not included, Library prep library preparation protocol, rcmd recommended, PL programming language used, n.a. not applicable/information not available, GW genome-wide, WGS whole-genome sequencing, ES exome sequencing, IRR in-repeat reads, RL read length, FL fragment length, Not limited not limited by either RL or FL, ECDF Empirical Cumulative Distribution Function, t-sum aggregated T statistic
  2. aCorrects the noise (stutters) introduced during PCR amplification-based library preparation
  3. bUtilizes phased single nucleotide variant haplotypes