Skip to main content

Table 1 INSaFLU outputs

From: INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Module

Output

Format(s)

Description

Tab for visualization/download

Read quality analysis and improvement

FastQC report (raw reads)

.html

FastQC graphical quality reports for raw read files uploaded

Samples/extra info

FastQC report (quality processed reads)

.html

FastQC graphical quality reports for quality processed reads

Samples/extra info

Quality processed reads (1P and 2P)

.fastq.gz

Uploaded reads after quality improvement using Trimmomatic

Samples/extra info

Type and sub-type identification

Influenza type and sub-type/lineage

graphical

INSaFLU detects the influenza A and B types, as well as all currently defined influenza A subtypes (18 hemagglutinin subtypes and 11 neuraminidase subtypes) and the two influenza B lineages (Yamagata and Victoria)

Samples/type and subtype

(output also included in each project’s “Sample list” table)

Draft assembly

.fasta

Draft de novo assembly used for type and sub-type/lineage identification. “Influenza-specific” contigs are assigned both to the corresponding viral segments number and to a related reference influenza virus (see next output).

Samples/extra info/type and subtype/lineage identification > “Draft assembly”

Assignment of viral segments and references

.tsv

Tab-separated file, where each “influenza-specific” NODE (or contig) is assigned both to the corresponding viral segment number (“GENE” column) and to a related reference influenza virus (“ACCESSION” column).

Samples/extra info/type and subtype/lineage identification > “Seg./Ref. to contigs”

Variant detection and consensus generation

Annotated reference file

.gbk

Uploaded reference genome (in .fasta) annotated using Prokka

References/GenBank file

Mapping file

.bam/graphical

Binary file storing aligned reads to a reference sequence (multi-mapping and unmapped reads are not included); the index is also provided (.bam.bai). “.bam” files can be explored in situ using the Integrative Genomics Viewer (IGV)

Projects/show project results/show sample detail results/mapping file by IGV

Annotated variants (SNPs and indels) per sample

.tab/.vcf

List of annotated variants assumed in the consensus sequences (for each sample)*

Projects/show project results/show sample detail results/mapping file by IGV

Annotated variants (SNPs and indels) per project

.tsv

Compiles all lists of annotated variants assumed in the consensus sequences*

Projects/show project results/project “name” > variants

Consensus sequences per sample (for the pool of loci)

.fasta

A version of the reference sequence with all validated variants replaced. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold)*

Projects/show project results/show sample detail results

Coverage analysis

Coverage report per project

.tsv

Compiles the coverage reports for each sample, including the following data: mean depth of coverage per locus, % of locus size covered by at least 1-fold and % of locus size covered by at least 10-fold.

Projects/show project results/project “name” > coverage

Coverage report per sample per locus (interactive color-coded statistics)

graphical

Green: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold = 100%;

Projects/show project results

Yellow: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold < 100%;

Red: % of locus size covered by at least 1-fold < 100% and % of locus size covered by at least 10-fold < 100%;

Coverage report per sample per locus (plot)

graphical

Plot of the depth of coverage throughout each locus

Projects/show project results

Alignment/phylogeny

Consensus nucleotide alignments per locus

.fasta/.nex/graphical

Locus-specific consensus nucleotide alignments. NOTE1: consensus sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold). Note 2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ.

Projects/show project results/nucleotide alignments by MSAViewer

Consensus nucleotide alignments—whole genome

.fasta/.nex/graphical

Consensus nucleotide alignments of the “whole genome” sequences (i.e., upon concatenation of all individual locus). Note 1: whole-genome sequences are exclusively generated for samples with all loci with 100% of its length covered by ≥ 10-fold. NOTE2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ.

Projects/show project results/nucleotide alignments by MSAViewer

Consensus amino acid alignments per encoded protein

.fasta/.nex/graphical

Consensus amino acid alignments per encoded protein. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold)

Projects/show project results/amino acid alignments by MSAViewer

Phylogenetic tree per locus

.nwk/.tree/graphical

Maximum likelihood phylogenetic tree for each locus-specific nucleotide alignment. NOTE: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively.

Projects/show project results/phylogenetic trees by PhyloCanvas

Phylogenetic treewhole genome

.nwk/.tree/graphical

Maximum likelihood phylogenetic tree for the alignments of the “whole-genome” sequences (upon concatenation of all individual locus). Note: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively.

Projects/show project results/phylogenetic trees by PhyloCanvas

Intra-host minor variant detection (and uncovering of putative mixed infections)

Annotated minor intra-host single nucleotide variants (iSNVs) per project

.tsv

Compiles all lists of detected and annotated minor iSNVs (i.e., SNV displaying intra-sample variation at frequency between 1 and 50% - minor variants).

Projects/show project results/intra-host minor variants annotation and uncovering of mixed infections

Plots of the proportion of iSNVs at frequencies 1–50% (minor iSNVs) and 50–90%

graphical

Plots the proportion of iSNV at frequency at 1–50% (minor iSNVs) and at frequency 50–90%. You may inspect this plot to uncover infections with influenza viruses presenting clearly distinct genetic backgrounds (so called “mixed infections”). INSaFLU flags samples as “putative mixed infections” if they fulfill the following cumulative criteria: the ratio of the number of iSNVs at frequency 1–50% (minor iSNVs) and 50–90% falls within the range 0.5–2.0 and the sum of the number of these two categories of iSNVs exceeds 20. Alternatively, to account for mixed infections involving extremely different viruses (e.g., A/H3N2 and A/H1N1), the flag is also displayed when he sum of the two categories of iSNVs exceeds 100, regardless of the first criterion.

Extra

List of samples per project

.csv/.tsv

List of samples per project (compiles all samples’ metadata and additional INSaFLU outputs). This file can be directly uploaded, together with associated alignment or phylogenetic data, to visualization tools, such as PHYLOViZ, Microreact and Phandango.

Projects/show project results/sample list

  1. *As a conservative approach, consensus sequences are exclusively generated for loci with 100% of its length covered by ≥ 10-fold. Still, validated variants falling within loci not fully covered with ≥ 10-fold, are still included in these lists (these cases are labeled in the column “VARIANTS in INCOMPLETE LOCUS” as YES), so that users can still retrieve valuable and reliable data (e.g., specific epitope and antiviral drug resistance mutations) from samples with borderline coverage