Integrative analysis of spatial and single-cell transcriptome data from human pancreatic cancer reveals an intermediate cancer cell population associated with poor prognosis

Background Recent studies using single-cell transcriptomic analysis have reported several distinct clusters of neoplastic epithelial cells and cancer-associated fibroblasts in the pancreatic cancer tumor microenvironment. However, their molecular characteristics and biological significance have not been clearly elucidated due to intra- and inter-tumoral heterogeneity. Methods We performed single-cell RNA sequencing using enriched non-immune cell populations from 17 pancreatic tumor tissues (16 pancreatic cancer and one high-grade dysplasia) and generated paired spatial transcriptomic data from seven patient samples. Results We identified five distinct functional subclusters of pancreatic cancer cells and six distinct cancer-associated fibroblast subclusters. We deeply profiled their characteristics, and we found that these subclusters successfully deconvoluted most of the features suggested in bulk transcriptome analysis of pancreatic cancer. Among those subclusters, we identified a novel cancer cell subcluster, Ep_VGLL1, showing intermediate characteristics between the extremities of basal-like and classical dichotomy, despite its prognostic value. Molecular features of Ep_VGLL1 suggest its transitional properties between basal-like and classical subtypes, which is supported by spatial transcriptomic data. Conclusions This integrative analysis not only provides a comprehensive landscape of pancreatic cancer and fibroblast population, but also suggests a novel insight to the dynamic states of pancreatic cancer cells and unveils potential therapeutic targets. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13073-024-01287-7.

Fig. S2.Identification of the epithelial subpopulations in pancreatic cancer.A, Expression of selected marker genes in total epithelial population.B, Violin plots showing the subcluster-specific scores calculated with established marker gene sets for the epithelial population.C, Expression of the marker genes are also depicted as a heatmap.Columns in heatmaps consist of epithelial cells grouped by subcluster annotation.D, The subcluster-specific scores are projected on each sample comprising the total epithelial population.Only the cells with the upper 10% percentile for each score are shown.E, (left) Epithelial cells from a publicly available dataset (Genome Med 2020;12:80) were merged with the epithelial cell population in our dataset and projected as a UMAP.(right) Prediction probabilities of the epithelial cells from the public dataset are projected on the UMAP.The prediction probabilities were generated from a logistic regression model, which was trained and tested with epithelial cells from this study.Fig. S3.Identification of the malignant populations in pancreatic cancer epithelial cells.A, UMAP projection of KRAS mutation status in epithelial cells.Mutation status at Gly-12 of KRAS is inferred from the KRAS transcript sequences obtained from each cell.B, KRAS mutation status grouped by subcluster annotation and presented as a bar plot.C, IGV showing the KRAS mutation status at Gly12 (chr12:25,245,245,351).Upper five panels represent bam files from the epithelial cells of each sample.Lower five panels show the bam files from the Ep_FXYD2 cluster of each sample.Marker genes for epithelial and fibroblast subclusters were compared between cluster-high spots and cluster-low spots.For each epithelial and fibroblast subcluster, cluster-high spots were defined as the spots with estimated abundances over 3, and the rest of the spatial spots were regarded as cluster-low spots.The current version of Visium transcriptomic analysis does not evaluate the full extent of human genes as comprehensively as scRNA-seq analysis; therefore, only the available genes from the marker genes established in scRNA-seq data were evaluated.The subclusters whose number of high-spots were less than 500 were excluded.
Fig. S4.Deconvolution of the proliferating epithelial subpopulation.A, Scatter plot showing the distinct cycling populations in the epithelial population.The cell cycle state scores were calculated based on phase-specific gene expression.B, An ROC curve showing results from the annotation transfer process of the cluster identities.Cluster identities are trained and tested in non-cycling populations and projected to the cycling populations, by a logistic regression method.C, Subcluster scores across the subcluster assignment in the Ep_CDK1 population.D, Number of epithelial cells in each epithelial subcluster except Ep_CDK1.E, Number of Ep_CDK1 cells assigned to each epithelial subcluster.
Fig. S5.Identification of the fibroblast subpopulations in pancreatic cancer.A, Expression of selected marker genes in total fibroblast-stellate cell population.B, Violin plots showing the subcluster-specific scores calculated with established marker gene sets for the fibroblast population.C, Expression of the marker genes are also depicted as a heatmap.Columns in heatmaps consist of fibroblast cells grouped by subcluster annotation.D, The subcluster-specific scores are projected on each sample comprising the total fibroblast-stellate cell population.Only the cells with the upper 10% percentile for each score are shown.
Fig.S6.Integration of the fibroblast atlas identifies a fibroblast progenitor population in pancreatic cancer.A, UMAP showing the various CAF subtype scores calculated based on the previously reported gene signatures.B, Heatmap representation of CAF scores in fibroblast subclusters.Scores were calculated in each cell based on the average expression of the marker gene sets.The average scores of the cells in each subcluster are standardized and represented as a heatmap.C, UMAP representation of the mouse perturbation fibroblast atlas and subcluster scores.Subcluster scores are calculated in the atlas based on the extended marker gene sets of the fibroblast-stellate population.D, Average subcluster scores across subcluster annotation.E, UMAP representation of the human perturbation fibroblast atlas and calculated subcluster scores.Subcluster scores were calculated in the atlas based on the marker gene sets of the fibroblast population.F, Scatter plot showing the correlation of PI16 expression and Fb_VIT score in the human fibroblast perturbation atlas.The P value for the coefficient was determined by a simple linear regression model.G, Expression of PI16 and CD34 in the fibroblast-stellate population of pancreatic cancer.H, UMAP plot depicting the RNA velocity streamlines.The UMAP projection was obtained from the fibroblaststellate cell data after regressing out the genes related to proliferation.I, Average expression of major DEG (differentially expressed genes) of reactive and deserted TME depicted on the fibroblast UMAP (top) and displayed as a scatter plot (bottom).

Fig. S7 .
Fig. S7.Identification of Fb_VIT populations.RNA in situ hybridization images of human PDAC tissue.Blue, red and green color indicate CD34, PI16 and PDGFRA, respectively.Scale bars (lower left corner of each image) at the 3 rd row indicate 20µm and the scale bars from the rest (1 st , 2 nd, 4 th ) of the rows indicate 10µm.

Fig. S10 .
Fig. S10.Identification of Fb_COL9A1 populations.A, RNA in situ hybridization images of human IPMN tissue.Blue, red and green color indicate TSLP, COCH, and PDGFRA, respectively.Scale bars (lower left corner of each image) from the top row indicate 10µm, and the bars in the second and third row indicate 20µm.B, Immunohistochemistry (IHC) results.Representative images of anti-COL9A1 (brown) staining in three patients with IPMN (top) and three patients with PDAC (bottom) are shown.Black arrows indicate COL9A1-positive long spindle/stellate cells, morphologically consistent with fibroblasts.Scale bars indicate 200µm.
Fig. S14.Correlation between the TF clusters and epithelial subclusters.A, Heatmap showing the inferred TF activities across the cells in each epithelial subcluster.B, Correlation matrix of TF activities in pancreatic cancer epithelial cells.Defined TF clusters are shown in green boxes.C, Average activities of each TF cluster across the epithelial subclusters.D, TF activity-based PCA plot annotated with epithelial cluster identity.

Fig. S18 .
Fig.S18.Marker gene expressions in spatial transcriptome data.Marker genes for epithelial and fibroblast subclusters were compared between cluster-high spots and cluster-low spots.For each epithelial and fibroblast subcluster, cluster-high spots were defined as the spots with estimated abundances over 3, and the rest of the spatial spots were regarded as cluster-low spots.The current version of Visium transcriptomic analysis does not evaluate the full extent of human genes as comprehensively as scRNA-seq analysis; therefore, only the available genes from the marker genes established in scRNA-seq data were evaluated.The subclusters whose number of high-spots were less than 500 were excluded.
D, Heatmap representation of the CopyKAT result.Gray and black boxes in the columns indicate chromosome numbers and locations.Orange boxes in rows indicate aneuploid cells.Green boxes indicate diploid cells inferred from CopyKAT.E, UMAP representation of the CopyKAT result.F, CopyKAT-inferred ploidies of epithelial cells in each subcluster.G, Bar plots showing (left) the batch (patient) composition of the Ep_FXYD2 population, (middleleft) KRAS mutation status of epithelial cells from each patient, (middle-right) KRAS mutation composition in Ep_FXYD2 cells from each patient, (right) and CopyKAT prediction results for epithelial cells from each patient.H, Expression of acinar cell, ADM cell and ductal cell markers in pancreatic epithelial cells.

Integration of the fibroblast atlas identifies a fibroblast progenitor population in pancreatic cancer. A,
UMAP showing the various CAF subtype scores calculated based on the previously reported gene signatures.B, Heatmap representation of CAF scores in fibroblast subclusters.Scores were calculated in each cell based on the average expression of the marker gene sets.The average scores of the cells in each subcluster are standardized and represented as a heatmap.C, UMAP representation of the mouse perturbation fibroblast atlas and subcluster scores.Subcluster scores are calculated in the atlas based on the extended marker gene sets of the fibroblast-stellate population.D, Average subcluster scores across subcluster annotation.E, UMAP representation of the human perturbation fibroblast atlas and calculated subcluster scores.Subcluster scores were calculated in the atlas based on the marker gene sets of the fibroblast population.F, Scatter plot showing the correlation of PI16 expression and Fb_VIT score in the human fibroblast perturbation atlas.The P value for the coefficient was determined by a simple linear regression model.G, Expression of PI16 and CD34 in the fibroblast-stellate population of pancreatic cancer.H, UMAP plot depicting the RNA velocity streamlines.The UMAP projection was obtained from the fibroblaststellate cell data after regressing out the genes related to proliferation.I, Average expression of major DEG (differentially expressed genes) of reactive and deserted TME depicted on the fibroblast UMAP (top) and displayed as a scatter plot (bottom).