INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Borges, Vítor; Pinheiro, Miguel; Pechirra, Pedro; Guiomar, Raquel; Gomes, João Paulo

doi:10.1186/s13073-018-0555-0

Table 1 INSaFLU outputs

From: INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Module	Output	Format(s)	Description	Tab for visualization/download
Read quality analysis and improvement	FastQC report (raw reads)	.html	FastQC graphical quality reports for raw read files uploaded	Samples/extra info
	FastQC report (quality processed reads)	.html	FastQC graphical quality reports for quality processed reads	Samples/extra info
	Quality processed reads (1P and 2P)	.fastq.gz	Uploaded reads after quality improvement using Trimmomatic	Samples/extra info
Type and sub-type identification	Influenza type and sub-type/lineage	graphical	INSaFLU detects the influenza A and B types, as well as all currently defined influenza A subtypes (18 hemagglutinin subtypes and 11 neuraminidase subtypes) and the two influenza B lineages (Yamagata and Victoria)	Samples/type and subtype (output also included in each project’s “Sample list” table)
	Draft assembly	.fasta	Draft de novo assembly used for type and sub-type/lineage identification. “Influenza-specific” contigs are assigned both to the corresponding viral segments number and to a related reference influenza virus (see next output).	Samples/extra info/type and subtype/lineage identification > “Draft assembly”
	Assignment of viral segments and references	.tsv	Tab-separated file, where each “influenza-specific” NODE (or contig) is assigned both to the corresponding viral segment number (“GENE” column) and to a related reference influenza virus (“ACCESSION” column).	Samples/extra info/type and subtype/lineage identification > “Seg./Ref. to contigs”
Variant detection and consensus generation	Annotated reference file	.gbk	Uploaded reference genome (in .fasta) annotated using Prokka	References/GenBank file
	Mapping file	.bam/graphical	Binary file storing aligned reads to a reference sequence (multi-mapping and unmapped reads are not included); the index is also provided (.bam.bai). “.bam” files can be explored in situ using the Integrative Genomics Viewer (IGV)	Projects/show project results/show sample detail results/mapping file by IGV
	Annotated variants (SNPs and indels) per sample	.tab/.vcf	List of annotated variants assumed in the consensus sequences (for each sample)*	Projects/show project results/show sample detail results/mapping file by IGV
	Annotated variants (SNPs and indels) per project	.tsv	Compiles all lists of annotated variants assumed in the consensus sequences*	Projects/show project results/project “name” > variants
	Consensus sequences per sample (for the pool of loci)	.fasta	A version of the reference sequence with all validated variants replaced. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold)*	Projects/show project results/show sample detail results
Coverage analysis	Coverage report per project	.tsv	Compiles the coverage reports for each sample, including the following data: mean depth of coverage per locus, % of locus size covered by at least 1-fold and % of locus size covered by at least 10-fold.	Projects/show project results/project “name” > coverage
	Coverage report per sample per locus (interactive color-coded statistics)	graphical	Green: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold = 100%;	Projects/show project results
			Yellow: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold < 100%;
			Red: % of locus size covered by at least 1-fold < 100% and % of locus size covered by at least 10-fold < 100%;
	Coverage report per sample per locus (plot)	graphical	Plot of the depth of coverage throughout each locus	Projects/show project results
Alignment/phylogeny	Consensus nucleotide alignments per locus	.fasta/.nex/graphical	Locus-specific consensus nucleotide alignments. NOTE1: consensus sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold). Note 2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ.	Projects/show project results/nucleotide alignments by MSAViewer
	Consensus nucleotide alignments—whole genome	.fasta/.nex/graphical	Consensus nucleotide alignments of the “whole genome” sequences (i.e., upon concatenation of all individual locus). Note 1: whole-genome sequences are exclusively generated for samples with all loci with 100% of its length covered by ≥ 10-fold. NOTE2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ.	Projects/show project results/nucleotide alignments by MSAViewer
	Consensus amino acid alignments per encoded protein	.fasta/.nex/graphical	Consensus amino acid alignments per encoded protein. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold)	Projects/show project results/amino acid alignments by MSAViewer
	Phylogenetic tree per locus	.nwk/.tree/graphical	Maximum likelihood phylogenetic tree for each locus-specific nucleotide alignment. NOTE: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively.	Projects/show project results/phylogenetic trees by PhyloCanvas
	Phylogenetic tree—whole genome	.nwk/.tree/graphical	Maximum likelihood phylogenetic tree for the alignments of the “whole-genome” sequences (upon concatenation of all individual locus). Note: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively.	Projects/show project results/phylogenetic trees by PhyloCanvas
Intra-host minor variant detection (and uncovering of putative mixed infections)	Annotated minor intra-host single nucleotide variants (iSNVs) per project	.tsv	Compiles all lists of detected and annotated minor iSNVs (i.e., SNV displaying intra-sample variation at frequency between 1 and 50% - minor variants).	Projects/show project results/intra-host minor variants annotation and uncovering of mixed infections
	Plots of the proportion of iSNVs at frequencies 1–50% (minor iSNVs) and 50–90%	graphical	Plots the proportion of iSNV at frequency at 1–50% (minor iSNVs) and at frequency 50–90%. You may inspect this plot to uncover infections with influenza viruses presenting clearly distinct genetic backgrounds (so called “mixed infections”). INSaFLU flags samples as “putative mixed infections” if they fulfill the following cumulative criteria: the ratio of the number of iSNVs at frequency 1–50% (minor iSNVs) and 50–90% falls within the range 0.5–2.0 and the sum of the number of these two categories of iSNVs exceeds 20. Alternatively, to account for mixed infections involving extremely different viruses (e.g., A/H3N2 and A/H1N1), the flag is also displayed when he sum of the two categories of iSNVs exceeds 100, regardless of the first criterion.
Extra	List of samples per project	.csv/.tsv	List of samples per project (compiles all samples’ metadata and additional INSaFLU outputs). This file can be directly uploaded, together with associated alignment or phylogenetic data, to visualization tools, such as PHYLOViZ, Microreact and Phandango.	Projects/show project results/sample list

*As a conservative approach, consensus sequences are exclusively generated for loci with 100% of its length covered by ≥ 10-fold. Still, validated variants falling within loci not fully covered with ≥ 10-fold, are still included in these lists (these cases are labeled in the column “VARIANTS in INCOMPLETE LOCUS” as YES), so that users can still retrieve valuable and reliable data (e.g., specific epitope and antiviral drug resistance mutations) from samples with borderline coverage

Back to article page

ISSN: 1756-994X

Contact us

Submission enquiries: editorial@genomemedicine.com
General enquiries: info@biomedcentral.com

Genome Medicine

Contact us