Skip to main content

Table 3 Explanation of the scoring functions evaluated

From: Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles

Scoring method Description
Cosine distance of term frequency-inverse document frequency j M g i ( j ) d i ( j ) j M g i ( j ) 2 j M d i ( j ) 2
Cosine distance of P-values i M g p ( i ) d p ( i ) i M g p ( i ) 2 i M d p ( i ) 2
Cosine distance of term fractions i M g f ( i ) d f ( i ) i M g f ( i ) 2 i M d f ( i ) 2
Sum of the log of combined P-values i M log g p ( i ) + d p ( i ) - g p ( i ) d p ( i )
Sum of the differences of log P-values i M log g p ( i ) d p ( i ) = i M log g p ( i ) - log d p ( i )
L2 of log-p of overlapping terms only i ( G D ) log g p ( i ) - log d p ( i ) 2
L2 of term fractions of overlapping terms only i ( G D ) g f ( i ) - d f ( i ) 2
L2 of log of P-values i M log g p ( i ) d p ( i ) 2 = i M log g p ( i ) - log d p ( i ) 2
L2 of P-values i M g p ( i ) - d p ( i ) 2
L2 of term fractions i M g f ( i ) - d f ( i ) 2
L2 of term frequency i M g ( i ) - d ( i ) 2
Term coverage |GD|
Term overlap |GD|
Number of gene MeSH terms |G|
Number of disease MeSH terms |D|
Gene ID Entrez Gene ID of the gene
  1. M refers to the set of all MeSH terms, G and D to the MeSH terms for the gene and disease profile, respectively. g(i), g f (i), g p (i) and g i (i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the gene profile. d(i), d f (i), d p (i) and d i (i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the disease profile.