Skip to main content

Table 1 Features of some publicly available short tandem repeat analysis algorithms

From: Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions

Features

lobSTR

RepeatSeq

HipSTR

TRED

EH

STRetch

exSTRa

GangSTR

Outputs repeat length?

Y

Y

Y

Y

Y

Y

N

Y

Sequencing reads

Single- and paired-end

Single- and paired-end

Single- and paired-end

Paired-end

Paired-end

Paired-end

Paired-end

Paired-end

Sequencing platforms supported

Illumina, Sanger, 454, and IonTorrent

Illumina

Illumina

Illumina

Illumina

Illumina

Illumina

Illumina

Library prep. supported

PCR and PCR-free

n.a.

PCR and PCR-free

PCR and PCR-free

PCR and PCR-free

PCR and PCR-free

PCR and PCR-free

PCR and PCR-free

Library prep. (rcmd)

None

None

None

None

PCR-free

PCR-free

None

None

Aligners (rcmd)

lobSTR and BWA-MEM

Novoalign and Bowtie 2

Indel-sensitive aligner

None

None

None

Bowtie 2

None

Analysis approach

Targeted and GW

Targeted and GW

Targeted and GW

Targeted

Targeted

GW

Targeted and GW

Targeted and GW

NGS data type supported

WGS

WGS

WGS

WGS

WGS and ES

WGS and ES

WGS and ES

WGS and ES

NGS data format

.bam, .fastq, or .fasta

.bam

.bam

.bam

.bam or .cram

.bam or .fastq

.bam

.bam

Built-in stutter correction modela

Y

Y

Y

Y

n.a.

n.a.

n.a.

Y

Test of significance

N

N

N

N

N

Y

Y

N

Read types used

Spanning

Spanning

Spanning

Spanning, flanking or partial, paired-end reads, and IRR

Spanning, flanking, and IRR/IRR pairs

Anchored IRR

Flanking and anchored IRR

Spanning, flanking, and IRR/IRR pairs

Phasingb

n.a

n.a

Y

n.a

n.a

n.a

n.a

n.a

PL

C++

C++

C++

Python

C++

Java

Perl and R

C++

Sizing limitation

RL

RL

RL

FL

Not limited

FL

n.a.

Not limited

Control dataset

Not required

Not required

Not required

Not required

Not required

Required

Not required

Not required

Complex repeats

n.a.

n.a.

n.a.

n.a.

Y

n.a.

n.a.

N

Output files

.vcf and .allelotype.stats

.repeatseq, .calls, and .vcf

.vcf

.vcf and .json

.vcf, .json, and .log

.tsv

p-values, ECDF, and tsum plots

.vcf

Customized regions file

Possible

Possible

Possible

Possible

Possible

Possible, but not rcmd

Possible

Possible

  1. EH ExpansionHunter, TRED TREDPARSE, Y feature included, N feature not included, Library prep library preparation protocol, rcmd recommended, PL programming language used, n.a. not applicable/information not available, GW genome-wide, WGS whole-genome sequencing, ES exome sequencing, IRR in-repeat reads, RL read length, FL fragment length, Not limited not limited by either RL or FL, ECDF Empirical Cumulative Distribution Function, t-sum aggregated T statistic
  2. aCorrects the noise (stutters) introduced during PCR amplification-based library preparation
  3. bUtilizes phased single nucleotide variant haplotypes