From: Genome annotation for clinical genomic diagnostics: strengths and weaknesses
Annotation procedure | Automatic annotation—for example, Ensembl | Manual annotation—for example, HAVANA |
---|---|---|
Genome analysis | Very quick | Very slow and labour intensive |
Annotation consistency | Consistent | Risk of subjectivity—achieving consistency requires careful training and monitoring |
Sequence quality | Flexible; can use unfinished, short-read NGS sequence, shotgun assembly | Best results on high-quality sequence, but can offer great insight into lower-quality assembly |
Functional annotation | Limited, lacking comprehensive detail of manual annotation—frequently misassign related sequences—i.e. protein-coding loci and pseudogenes | Extensive use of biotypes, such as coding, pseudogene, lncRNA, NMD, etc. |
Complex genomic regions | Limited in ability to represent complex structures and other nonstandard features | Superior representation and resolution of gene families and able to define CDS regions of complicated gene structures |
Gene annotation | Many false-positive and false-negative calls at locus level in all gene biotypes | Better coverage of loci and alternatively spliced transcripts |
Pseudogenes | Limited | Able to predict pseudogenes and differentiate from genuine coding genes |
Poly(A) features | Limited | Annotates poly(A) features |
Flexibility | Error prone, forces problems such as non-canonical splicing and can only look at sequences more or less in isolation | Deals with inconsistencies in data, consults literature and other databases, can compare paralogues and orthologues and rapidly integrate new sequencing technologies |