Skip to main content

Table 13 Datasets used in the analysis with details on size and relevant contents

From: Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles

Dataset

February 2007

January 2009

April 2010

Entrez Gene (including gene2pubmed and GeneRIF)

   

   Total genes

2,460,748

4,710,910

5,999,558

   Human genes

38,604

40,183

45,423

 

Baseline 2007 (Nov 2006)

Baseline 2009 (Nov 2008)

Baseline 2010 (Nov 2009)

MEDLINE®

   

   Total articles

16,120,073

17,764,232

18,502,915

gene2pubmed (Linking Entrez Gene and MEDLINE®

   

   Total links

3,081,413

12,960,489

5,979,167

   Total human gene links

272,123

445,650

527,821

  1. Although the number of human genes has not increased much over the years, the number of non-human links has increased substantially since 2007, while the human gene links have increased at a more moderate rate. Previously, MEDLINE®/PubMed® links from genomic sequence were propagated to all related genes. This practice was discontinued in March 2009, resulting (at the time) in a 60% decrease in links and the disparity in the number of overall links from 2009 to 2010.