# Table 3 Explanation of the scoring functions evaluated

Scoring method Description
Cosine distance of term frequency-inverse document frequency $\sum _{j\in M}\left({g}_{i}\left(j\right){d}_{i}\left(j\right)\right)/\left(\sqrt{\sum _{j\in M}{\left({g}_{i}\left(j\right)\right)}^{2}}\sqrt{\sum _{j\in M}{\left({d}_{i}\left(j\right)\right)}^{2}}\right)$
Cosine distance of P-values $\sum _{i\in M}\left({g}_{p}\left(i\right){d}_{p}\left(i\right)\right)/\left(\sqrt{\sum _{i\in M}{\left({g}_{p}\left(i\right)\right)}^{2}}\sqrt{\sum _{i\in M}{\left({d}_{p}\left(i\right)\right)}^{2}}\right)$
Cosine distance of term fractions $\sum _{i\in M}\left({g}_{f}\left(i\right){d}_{f}\left(i\right)\right)/\left(\sqrt{\sum _{i\in M}{\left({g}_{f}\left(i\right)\right)}^{2}}\sqrt{\sum _{i\in M}{\left({d}_{f}\left(i\right)\right)}^{2}}\right)$
Sum of the log of combined P-values $\sum _{i\in M}\mathsf{\text{log}}\left({g}_{p}\left(i\right)+{d}_{p}\left(i\right)-{g}_{p}\left(i\right){d}_{p}\left(i\right)\right)$
Sum of the differences of log P-values $\sum _{i\in M}\left|\mathsf{\text{log}}\left(\frac{{g}_{p}\left(i\right)}{{d}_{p}\left(i\right)}\right)\right|=\sum _{i\in M}\left|\mathsf{\text{log}}\left({g}_{p}\left(i\right)\right)-\mathsf{\text{log}}\left({d}_{p}\left(i\right)\right)\right|$
L2 of log-p of overlapping terms only $\sqrt{\sum _{i\in \left(G\cap D\right)}{\left(\mathsf{\text{log}}\left({g}_{p}\left(i\right)\right)-\mathsf{\text{log}}\left({d}_{p}\left(i\right)\right)\right)}^{2}}$
L2 of term fractions of overlapping terms only $\sqrt{\sum _{i\in \left(G\cap D\right)}{\left({g}_{f}\left(i\right)-{d}_{f}\left(i\right)\right)}^{2}}$
L2 of log of P-values $\sqrt{\sum _{i\in M}{\left(\mathsf{\text{log}}\left(\frac{{g}_{p}\left(i\right)}{{d}_{p}\left(i\right)}\right)\right)}^{2}}=\sqrt{\sum _{i\in M}{\left(\mathsf{\text{log}}\left({g}_{p}\left(i\right)\right)-\mathsf{\text{log}}\left({d}_{p}\left(i\right)\right)\right)}^{2}}$
L2 of P-values $\sqrt{\sum _{i\in M}{\left({g}_{p}\left(i\right)-{d}_{p}\left(i\right)\right)}^{2}}$
L2 of term fractions $\sqrt{\sum _{i\in M}{\left({g}_{f}\left(i\right)-{d}_{f}\left(i\right)\right)}^{2}}$
L2 of term frequency $\sqrt{\sum _{i\in M}{\left(g\left(i\right)-d\left(i\right)\right)}^{2}}$
Term coverage |GD|
Term overlap |GD|
Number of gene MeSH terms |G|
Number of disease MeSH terms |D|
Gene ID Entrez Gene ID of the gene
1. M refers to the set of all MeSH terms, G and D to the MeSH terms for the gene and disease profile, respectively. g(i), g f (i), g p (i) and g i (i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the gene profile. d(i), d f (i), d p (i) and d i (i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the disease profile. 