Challenge | Description | Relevance | Solution | Reference(s) |
---|---|---|---|---|
Multiple technologies | Next-generation sequencing can be performed on multiple platforms each with different characteristics, and each constantly under improvement | Difficulty comparing results from different platforms and with those from older techniques | Pipelines must be constantly updated to account for new techniques | |
Universal approach not yet possible | Different platforms should be utilized depending on the question asked | |||
 |  | Continuously evolving technology requires skilled workforce rather than established pipelines |  |  |
Computational resources | Our ability to generate DNA sequence data has rapidly surpassed our computational abilities to analyze the data | Significant requirements for storage of DNA sequence | Perform analysis using a staged approach | |
 |  | Assembling and identifying short reads from next-generation sequencing is computationally intensive | Cloud computing |  |
Suitable reference databases | Multiple reference databases are available, which may generate different results depending on the database used | Certain features of a metagenomic sample might be missed if the wrong database is used | HMP aims to sequence multiple references genomes associated with the human body | [94] |
 |  | Limited by the diversity represented in each database | HMP currently has a total of 6,500 reference sequences generated |  |
Short read lengths | Read lengths depend on sequencing platform used | Makes de novo assembly more complicated | Read lengths are continually increasing | |
 |  | More difficult to identify large-scale genomic variations and repetitive regions | Third-generation sequencing platforms promise much longer read lengths |  |
Causation | Finding a pathogen in a disease sample does not imply causation | Important to determine causation before changing public health management | Follow-up studies are required - for example, using animal models, or serological or epidemiological methods. | |
 |  | False association can lead to costly, useless or even potentially harmful therapies | Results must be independently validated |  |
Contamination | Metagenomics can detect contaminants from cell cultures, reagents and laboratory equipment | Contaminants may be incorrectly associated with the disease of interest | Negative controls must be used | [97] |
Researchers must consider the plausibility of the findings | ||||
 |  |  | Results must be independently validated |  |
Privacy | Host nucleic acids are almost always sequenced in metagenomics studies | Host genetic sequences are confidential | Host DNA to be available only to researchers in HMP | |
 |  | Human subjects might be traceable from their DNA sequences | Only microbiome data are released to the public |  |