Skip to main content

Table 13 Datasets used in the analysis with details on size and relevant contents

From: Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles

Dataset February 2007 January 2009 April 2010
Entrez Gene (including gene2pubmed and GeneRIF)    
   Total genes 2,460,748 4,710,910 5,999,558
   Human genes 38,604 40,183 45,423
  Baseline 2007 (Nov 2006) Baseline 2009 (Nov 2008) Baseline 2010 (Nov 2009)
MEDLINE®    
   Total articles 16,120,073 17,764,232 18,502,915
gene2pubmed (Linking Entrez Gene and MEDLINE®    
   Total links 3,081,413 12,960,489 5,979,167
   Total human gene links 272,123 445,650 527,821
  1. Although the number of human genes has not increased much over the years, the number of non-human links has increased substantially since 2007, while the human gene links have increased at a more moderate rate. Previously, MEDLINE®/PubMed® links from genomic sequence were propagated to all related genes. This practice was discontinued in March 2009, resulting (at the time) in a 60% decrease in links and the disparity in the number of overall links from 2009 to 2010.