Skip to main content

Table 1 Characteristics of curated breast cancer datasets

From: CoINcIDE: A framework for discovery of patient subtypes across multiple datasets

Dataset ID Batch ID GEO platform ID Commercial platform name Samples (n) Genes (n)
1 12093   GPL96 Affymetrix Human Genome U133A 136 11,723
2 1379    Arcturus 22 k human oligonucleotide 60 11,723
3 16391    Affymetrix Human Genome U133 Plus 2.0 48 15,199
4 16446    Affymetrix Human Genome U133A 114 16,326
5 17705 JBI   Affymetrix Human Genome U133A 103 10,565
6 17705 MDACC   Affymetrix Human Genome U133A 195 11,026
7 19615    Affymetrix Human Genome U133 Plus 2.0 115 16,652
8 20181    Affymetrix Human Genome U133A 53 10,171
9 20194    Affymetrix Human Genome U133A 261 11,748
10 2034    Affymetrix Human Genome U133A 286 11,020
11 22226    Agilent-012391 Whole Human Genome Oligo G4112A 127 18,841
12 22358    AFFY Human Phase3 v1.0 - C02 121 17,253
13 25055 MDACC_M   Affymetrix Human Genome U133A 221 11,459
14 25065 MDACC   Affymetrix Human Genome U133A 71 11,158
15 25065 USO   Affymetrix Human Genome U133A 54 10,822
16 32646    Affymetrix Human Genome U133 Plus 2.0 115 18,260
17 9893    MLRG Human 21 K V12.0 155 13,154
  1. All breast cancer datasets are from the Gene Expression Omnibus (GEO). ‘Batch dataset’ does not refer to a lab replicate batch, but rather a group of microarrays from a larger study that were run on a different platform, or collected from a different site. These batch labels were inferred from GEO sample file names. GSE = GEO series ID prefix