Skip to main content

Table 1 X-CAP features. violetItalicized features are novel and have not been used in previous stopgain pathogenicity predictors. Specifically, no features related to zygosity, stop codon read-through, or alternative translation reinitiation are present in earlier classifiers

From: X-CAP improves pathogenicity prediction of stopgain variants

Feature type

Feature name

Description

Zygosity

violetzygosity

Binary variable distinguishing homozygous (and hemizygous) variants from heterozygous variants, inputed when known or predicted as a function of benign stopgain alleles at the same position in training set when unknown

Gene/exon essentiality

violetoe

Number of benign stopgains in training set along gene divided by gnomAD’s expected number of loss-of-function variants

 

RVIS

Measure of gene intolerance to functional variation

 

OMIM gene map

Two non-exclusive, binary features indicating whether a recessive or dominant disease listed in the OMIM Gene Map is caused by mutations in this gene

 

violetmonoclass pathogenic

Transcript or exon contains no benign variants and at least one pathogenic variant within training set

 

violetcan be spliced out

Variant is skipped in at least one isoform of the gene

Variant location

distance from CDS start/end

Number of coding nucleotides from CDS start and end

 

relative CDS location

Distance from CDS start divided by CDS length

 

violetdistance from exon start/end

Number of coding nucleotides from exon start and end

 

violetrelative exon location

Distance from exon start divided by exon length

 

violetexon length

Number of nucleotides in overlapped exon

 

violetexon number

Index of the exon that the variant overlaps

 

violet# transcript exons

Number of exons in overlapped transcript

 

chromosome

Ternary variable indicating if the variable is located on an autosomal, X, or Y chromosome

NMD

distance from last exon-exon junction

Number of coding nucleotides upstream from last exon-exon junction (negative if downstream of junction)

 

violet% transcripts with NMD

Percentage of overlapped transcripts in which the variant is >50 bp upstream of the last exon-exon junction

Stop codon read-through

violetstop codon

One-hot encoding of the new stop codon introduced by the stopgain

Alternative translation reinitiation

violetdistance to next start codon

Number of base pairs between the variant and the next potential downstream start codon within the mRNA

Cross-species conservation

phyloP

Base-pair conservation across vertebrates of upstream, downstream, and overlapped exon regions

 

phastCons

Regional conservation across vertebrates of upstream, downstream, and overlapped exon regions