Skip to main content

Advertisement

Table 1 Same software, different transcripts: REFSEQ vs ENSEMBL by ANNOVAR annotation category

From: Choice of transcripts and software has a large effect on variant annotation

  REF+ENS REF ENS Match REF match ENS match Overall match
      rate (%) rate (%) rate (%)
stopgain_SNV 15,835 14,183 14,960 13,308 93.83 88.96 84.04
frameshift_insertion 6,980 5,298 6,495 4,813 90.85 74.10 68.95
frameshift_deletion 7,491 4,547 7,380 4,436 97.56 60.11 59.22
stoploss_SNV 946 503 906 463 92.05 51.10 48.94
splicing 47,878 14,154 45,839 12,115 85.59 26.43 25.30
frameshift_substitution 1,960 195 1,947 182 93.33 9.35 9.29
nonsynonymous_SNV 321,669 291,898 315,592 285,821 97.92 90.57 88.86
nonframeshift_insertion 3,506 2,888 2,844 2,226 77.08 78.27 63.49
nonframeshift_deletion 5,136 3,321 4,963 3,148 94.79 63.43 61.29
nonframeshift_substitution 933 226 843 136 60.18 16.13 14.58
synonymous_SNV 178,559 167,561 172,463 161,465 96.36 93.62 90.43
UTR3 724,802 574,255 622,441 471,894 82.17 75.81 65.11
UTR5 177,832 94,545 162,684 79,397 83.98 48.80 44.65
UTR5_UTR3 2,183 292 2,092 201 68.84 9.61 9.21
ncRNA_intronic 8,992,009 2,113,428 8,244,441 1,365,860 64.63 16.57 15.19
ncRNA_exonic 654,098 140,303 597,947 84,152 59.98 14.07 12.87
ncRNA_UTR3 53,379 10,712 47,133 4,466 41.69 9.48 8.37
ncRNA_UTR5 10,683 1,989 9,444 750 37.71 7.94 7.02
ncRNA_splicing 13,931 1,051 13,562 682 64.89 5.03 4.90
ncRNA_UTR5_ncRNA_UTR3 107 1 106 0 0.00 0.00 0.00
intronic 29,289,037 26,805,864 27,743,749 25,260,576 94.24 91.05 86.25
intergenic 50,305,202 49,797,113 41,307,708 40,799,619 81.93 98.77 81.10
downstream 991,811 474,684 840,376 323,249 68.10 38.46 32.59
upstream 910,818 440,728 762,664 292,574 66.38 38.36 32.12
upstream_downstream 53,608 15,621 47,293 9,306 59.57 19.68 17.36
unknown 11,205 6,215 5,703 713 11.47 12.50 6.36
ALL LOF 81,090 38,880 77,527 35,317 90.84 45.55 43.55
ALL LOF and MISSENSE 412,334 337,213 401,769 326,648 96.87 81.30 79.22
ALL EXONIC 590,893 504,774 574,232 488,113 96.70 85.00 82.61
ALL 80,981,575 80,981,575 80,981,575 69,181,552 85.43 85.43 85.43
  1. This table summarises the number of annotations that match between the REFSEQ and ENSEMBL results for each category of annotation. It shows the number of variants given each type of annotation when using (i) either REFSEQ or ENSEMBL (‘REF+ENS’; union), (ii) REFSEQ (‘REF’) and (iii) ENSEMBL (‘ENS’). It also shows the number of variants that have matching annotations (i.e. the same annotation when using both transcript sets; intersection) and the match rate for each transcript set, which expresses the proportion of matching annotations for an annotation term relative to the total number of annotations in the category from the particular transcript set, as a percentage. The final column shows the ‘Overall match rate’, which is the percentage of the variants with a given annotation when using either REFSEQ or ENSEMBL (‘REF+ENS’) that have a matching annotation when using the two transcript sets. Categories are loosely ordered by the severity of effect, with LoF annotations listed before nonsynonymous, synonymous, non-exonic categories and so on. Within each loose group, categories are sorted in descending order of overall matching rate. The bottom four rows show the total degree of matching across all putative loss-of-function (LoF) categories, all LoF and missense categories, all exonic categories and, finally, all categories.