Skip to main content

Table 4 Performance using GeneRIF as the gene-literature data source sets

From: Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles

Scoring method Novel MEDLINE validation AUC (02/2007-01/2009) Novel MEDLINE validation AUC (02/2007-04/2010) Pre-existing CTD validation AUC (11/2008) Novel CTD validation AUC (11/2008-04/2010) Pre-existing MEDLINE validation AUC (02/2007) Mean AUC Rank
Cosine distance of term frequency-inverse document frequency 0.90 0.89 0.93 0.91 0.98 0.92 2
Cosine distance of P-values 0.56 0.57 0.60 0.56 0.53 0.56 15
Cosine distance of term fractions 0.86 0.84 0.91 0.88 0.96 0.89 4
Sum of the log of combined P-values 0.86 0.85 0.92 0.90 0.94 0.90 3
Sum of the differences of log P-values 0.91 0.91 0.77 0.83 0.93 0.87 6
L2 of log-p of overlapping terms only 0.94 0.93 0.91 0.92 0.98 0.94 1
L2 of term fractions of overlapping terms only 0.56 0.55 0.55 0.56 0.51 0.55 16
L2 of log of P-values 0.90 0.90 0.76 0.83 0.93 0.86 9
L2 of P-values 0.90 0.90 0.76 0.81 0.92 0.86 11
L2 of term fractions 0.86 0.85 0.89 0.88 0.94 0.88 5
L2 of term frequency 0.90 0.90 0.76 0.83 0.93 0.86 10
Term coverage 0.91 0.90 0.77 0.83 0.93 0.87 7
Term overlap 0.82 0.82 0.86 0.86 0.87 0.85 12
Number of gene MeSH terms 0.74 0.73 0.80 0.80 0.81 0.78 13
Number of disease MeSH terms 0.90 0.90 0.77 0.83 0.93 0.87 8
Gene ID 0.64 0.64 0.69 0.69 0.66 0.66 14
  1. AUC of the described scoring methods were compared and tested on the validation. CTD, Comparative Toxicogenomics Database.