From: Genome annotation for clinical genomic diagnostics: strengths and weaknesses
Biotype | Description |
---|---|
Protein coding | Contains an ORF that has strong coding potential |
 Known coding | 100% identical to known RefSeq protein or Swiss-Prot entry |
 Novel coding | Shares >60% length with known coding sequence from RefSeq, or Swiss-Prot, or has cross-species/family support or domain evidence |
 Putative coding | Shares <60% length with known coding sequence from RefSeq, or Swiss-Prot, or has an alternative first or last coding exon |
 Nonsense-mediated decay | If the coding sequence (following the appropriate reference) of a transcript finishes >50 bp from a downstream splice site, then it is tagged as NMD. If the variant does not cover the full reference coding sequence, then it is annotated as NMD if NMD is unavoidable—i.e. no matter what the exon structure of the missing portion is, the transcript will be subject to NMD |
 Non-stop decay | Transcripts that have poly(A) features (including signal) without a prior stop codon in the CDS—i.e. a non-genomic poly(A) tail attached directly to the CDS without a 3′ UTR; these transcripts are subject to degradation |
 Retained intron | Alternatively spliced transcript believed to contain intronic sequence relative to other, coding, variants |
 Processed transcript | Cannot assign an ORF, but is part of a coding locus |
lncRNA | Long non-coding RNA—lacks protein-coding potential and is of length >200 bp |
 Bidirectional promoter | Transcription start sites of the lncRNA model and the protein-coding model are on opposite strands and within 200 bp of one another, or are found in the same CpG island |
 3-Prime overlapping | Transcription start site and/or published experimental data support independent transcription from the 3′ UTR of a coding gene |
 Antisense | At least one variant overlaps a protein-coding locus on the opposite strand, or evidence of antisense regulation of a coding gene has been published |
 lincRNA | Long intergenic ncRNA: does not overlap (neither sense nor antisense) a coding gene |
 Sense intronic | In an intron of a coding gene; no exonic overlap |
 Sense overlapping | Contains a coding gene in an intron; no exonic overlap. |
Pseudogene | Matches to protein, but ORF disrupted by frameshifts and/or premature stop codons |
 Processed | Lacks introns and arose from retrotransposition of parent gene mRNA |
 Unprocessed | Can contain introns and is produced by genomic duplication |
 Transcribed | Locus-specific transcripts indicate transcription; these can be classified into ‘processed’ and ‘unprocessed’ |
 Translated | Locus-specific protein mass spectroscopy data suggest translation; the connection is maintained with the pseudogene biotype until the experimental community validates it as a coding gene |
 Polymorphic | Pseudogene owing to a single-nucleotide variant (SNV), or insertion-deletion variant (indel); but the same gene is translated in other individuals/haplotypes/strains |
 Unitary | Species-specific unprocessed pseudogene, without a parent gene, that has an active orthologue in another species |