Skip to main content

Table 1 Characteristics of curated breast cancer datasets

From: CoINcIDE: A framework for discovery of patient subtypes across multiple datasets

Dataset

ID

Batch ID

GEO platform ID

Commercial platform name

Samples (n)

Genes (n)

1

12093

 

GPL96

Affymetrix Human Genome U133A

136

11,723

2

1379

  

Arcturus 22 k human oligonucleotide

60

11,723

3

16391

  

Affymetrix Human Genome U133 Plus 2.0

48

15,199

4

16446

  

Affymetrix Human Genome U133A

114

16,326

5

17705

JBI

 

Affymetrix Human Genome U133A

103

10,565

6

17705

MDACC

 

Affymetrix Human Genome U133A

195

11,026

7

19615

  

Affymetrix Human Genome U133 Plus 2.0

115

16,652

8

20181

  

Affymetrix Human Genome U133A

53

10,171

9

20194

  

Affymetrix Human Genome U133A

261

11,748

10

2034

  

Affymetrix Human Genome U133A

286

11,020

11

22226

  

Agilent-012391 Whole Human Genome Oligo G4112A

127

18,841

12

22358

  

AFFY Human Phase3 v1.0 - C02

121

17,253

13

25055

MDACC_M

 

Affymetrix Human Genome U133A

221

11,459

14

25065

MDACC

 

Affymetrix Human Genome U133A

71

11,158

15

25065

USO

 

Affymetrix Human Genome U133A

54

10,822

16

32646

  

Affymetrix Human Genome U133 Plus 2.0

115

18,260

17

9893

  

MLRG Human 21 K V12.0

155

13,154

  1. All breast cancer datasets are from the Gene Expression Omnibus (GEO). ‘Batch dataset’ does not refer to a lab replicate batch, but rather a group of microarrays from a larger study that were run on a different platform, or collected from a different site. These batch labels were inferred from GEO sample file names. GSE = GEO series ID prefix