Skip to main content

Table 2 Overview of all datasets used in the analysis

From: Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease

Study Dataset Sample size Data type Attribute
Simulation Simulated cellsa 5000 cells scRNA-seq Cell type
Simulated patientsa 600 patients RNA-seq Disease status
Glioblastoma Patel et al., 2014 [33] 532 cells
(5 patients)
TCGA GBM [34] 111 patients Microarray GBM subtype
Alzheimer’s disease AIBS 47,396 cells
(11 patients)
Brain cell types
Grubman et al., 2019 [36] 13,214 cells
(12 patients)
(10x Genomics)
AD and normal brain cell types
Mathys et al., 2019 [15] 5288 cellsb
(48 patients)
(10x Genomics)
AD and normal brain cell types
MSBB [35] 682 samples
(221 patients)
RNA-seq AD diagnosis
Multiple myeloma MMRF [47] 647 patients RNA-seq PFS
Chen et al. 2021 [46]
22,968 cells
(4 patients)
(10x Genomics)
Subtype cluster
(Subtype 1-5)
Ledergor et al., 2019 [45] 13,440 cells
(35 patients)
Malignancy (NHIP, MGUS, SMM, MM)
Zhan et al., 2006 [44] 559 patients Microarray OS
  1. aThe simulated patients were generated from the simulated cells by combining known proportions of cell types. “None” is used to denote the lack of labels for the cells/samples in a given dataset. bCells were down-sampled from the total number of cells because some cell types were over-represented. The following are all of the abbreviations: The Cancer Genome Atlas (TCGA), Glioblastoma Multiforme (GBM), Allen Institute for Brain Science (AIBS), Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB), Multiple Myeloma Research Foundation (MMRF), Indiana University School of Medicine (IUSM), Alzheimer’s disease (AD), progression-free survival (PFS), overall survival (OS), normal hip (NHIP), monoclonal gammopathy of undetermined significance (MGUS), smoldering multiple myeloma (SMM), multiple myeloma (MM), RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq), and single nuclei RNA-seq (snRNA-seq)