Skip to main content

Table 1 INSaFLU outputs

From: INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza whole-genome-sequencing-based surveillance

Module Output Format(s) Description Tab for visualization/download
Read quality analysis and improvement FastQC report (raw reads) .html FastQC graphical quality reports for raw read files uploaded Samples/extra info
FastQC report (quality processed reads) .html FastQC graphical quality reports for quality processed reads Samples/extra info
Quality processed reads (1P and 2P) .fastq.gz Uploaded reads after quality improvement using Trimmomatic Samples/extra info
Type and sub-type identification Influenza type and sub-type/lineage graphical INSaFLU detects the influenza A and B types, as well as all currently defined influenza A subtypes (18 hemagglutinin subtypes and 11 neuraminidase subtypes) and the two influenza B lineages (Yamagata and Victoria) Samples/type and subtype
(output also included in each project’s “Sample list” table)
Draft assembly .fasta Draft de novo assembly used for type and sub-type/lineage identification. “Influenza-specific” contigs are assigned both to the corresponding viral segments number and to a related reference influenza virus (see next output). Samples/extra info/type and subtype/lineage identification > “Draft assembly”
Assignment of viral segments and references .tsv Tab-separated file, where each “influenza-specific” NODE (or contig) is assigned both to the corresponding viral segment number (“GENE” column) and to a related reference influenza virus (“ACCESSION” column). Samples/extra info/type and subtype/lineage identification > “Seg./Ref. to contigs”
Variant detection and consensus generation Annotated reference file .gbk Uploaded reference genome (in .fasta) annotated using Prokka References/GenBank file
Mapping file .bam/graphical Binary file storing aligned reads to a reference sequence (multi-mapping and unmapped reads are not included); the index is also provided (.bam.bai). “.bam” files can be explored in situ using the Integrative Genomics Viewer (IGV) Projects/show project results/show sample detail results/mapping file by IGV
Annotated variants (SNPs and indels) per sample .tab/.vcf List of annotated variants assumed in the consensus sequences (for each sample)* Projects/show project results/show sample detail results/mapping file by IGV
Annotated variants (SNPs and indels) per project .tsv Compiles all lists of annotated variants assumed in the consensus sequences* Projects/show project results/project “name” > variants
Consensus sequences per sample (for the pool of loci) .fasta A version of the reference sequence with all validated variants replaced. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold)* Projects/show project results/show sample detail results
Coverage analysis Coverage report per project .tsv Compiles the coverage reports for each sample, including the following data: mean depth of coverage per locus, % of locus size covered by at least 1-fold and % of locus size covered by at least 10-fold. Projects/show project results/project “name” > coverage
Coverage report per sample per locus (interactive color-coded statistics) graphical Green: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold = 100%; Projects/show project results
Yellow: % of locus size covered by at least 1-fold = 100% and % of locus size covered by at least 10-fold < 100%;
Red: % of locus size covered by at least 1-fold < 100% and % of locus size covered by at least 10-fold < 100%;
Coverage report per sample per locus (plot) graphical Plot of the depth of coverage throughout each locus Projects/show project results
Alignment/phylogeny Consensus nucleotide alignments per locus .fasta/.nex/graphical Locus-specific consensus nucleotide alignments. NOTE1: consensus sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold). Note 2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ. Projects/show project results/nucleotide alignments by MSAViewer
Consensus nucleotide alignments—whole genome .fasta/.nex/graphical Consensus nucleotide alignments of the “whole genome” sequences (i.e., upon concatenation of all individual locus). Note 1: whole-genome sequences are exclusively generated for samples with all loci with 100% of its length covered by ≥ 10-fold. NOTE2: The “.fasta” files can be directly uploaded, together with associated metadata (“Sample_list.tsv”), to visualization tools, such as PHYLOViZ. Projects/show project results/nucleotide alignments by MSAViewer
Consensus amino acid alignments per encoded protein .fasta/.nex/graphical Consensus amino acid alignments per encoded protein. Note: sequences are exclusively generated for locus with 100% of its length covered by ≥ 10-fold) Projects/show project results/amino acid alignments by MSAViewer
Phylogenetic tree per locus .nwk/.tree/graphical Maximum likelihood phylogenetic tree for each locus-specific nucleotide alignment. NOTE: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively. Projects/show project results/phylogenetic trees by PhyloCanvas
Phylogenetic treewhole genome .nwk/.tree/graphical Maximum likelihood phylogenetic tree for the alignments of the “whole-genome” sequences (upon concatenation of all individual locus). Note: The “.nwk” and “.tree” phylogenetic trees can be directly uploaded, together with associated metadata (“Sample_list.csv”), to visualization tools, such as Microreact and Phandango, respectively. Projects/show project results/phylogenetic trees by PhyloCanvas
Intra-host minor variant detection (and uncovering of putative mixed infections) Annotated minor intra-host single nucleotide variants (iSNVs) per project .tsv Compiles all lists of detected and annotated minor iSNVs (i.e., SNV displaying intra-sample variation at frequency between 1 and 50% - minor variants). Projects/show project results/intra-host minor variants annotation and uncovering of mixed infections
Plots of the proportion of iSNVs at frequencies 1–50% (minor iSNVs) and 50–90% graphical Plots the proportion of iSNV at frequency at 1–50% (minor iSNVs) and at frequency 50–90%. You may inspect this plot to uncover infections with influenza viruses presenting clearly distinct genetic backgrounds (so called “mixed infections”). INSaFLU flags samples as “putative mixed infections” if they fulfill the following cumulative criteria: the ratio of the number of iSNVs at frequency 1–50% (minor iSNVs) and 50–90% falls within the range 0.5–2.0 and the sum of the number of these two categories of iSNVs exceeds 20. Alternatively, to account for mixed infections involving extremely different viruses (e.g., A/H3N2 and A/H1N1), the flag is also displayed when he sum of the two categories of iSNVs exceeds 100, regardless of the first criterion.
Extra List of samples per project .csv/.tsv List of samples per project (compiles all samples’ metadata and additional INSaFLU outputs). This file can be directly uploaded, together with associated alignment or phylogenetic data, to visualization tools, such as PHYLOViZ, Microreact and Phandango. Projects/show project results/sample list
  1. *As a conservative approach, consensus sequences are exclusively generated for loci with 100% of its length covered by ≥ 10-fold. Still, validated variants falling within loci not fully covered with ≥ 10-fold, are still included in these lists (these cases are labeled in the column “VARIANTS in INCOMPLETE LOCUS” as YES), so that users can still retrieve valuable and reliable data (e.g., specific epitope and antiviral drug resistance mutations) from samples with borderline coverage