Techniques for analysing spatial transcriptomic data have proliferated over recent years. They serve a variety of purposes, including preparing, or ‘pre-processing’ data for analysis (e.g. aligning microscopy images of labelled mRNAs), conducting biological analyses (e.g. spatial differential gene expression), combining, or ‘integrating’ with scRNA-seq data, or inference of cell-cell interactions (Fig. 3). Below, we review pre-processing steps and provide guidance on some common analyses, with recent examples of each.
Pre-processing spatial transcriptomic data
One of the first analytical tasks to perform is to prepare spatial transcriptomic data for analysis, known as pre-processing. This usually serves to convert raw imaging or sequencing data to a matrix of transcript counts per gene by spatial capture areas, which we will refer to as gene-spot matrices (in multi-omic methods, genes may be accompanied by protein counts), with specific methods required for data generated by different techniques (Fig. 3).
Typical ISH- and ISS-based methods will read out a sequence or gene-specific barcode over multiple hybridization rounds, so the images are uninformative on their own. There are several steps required to convert these images to a gene-spot matrix. First, images are filtered to remove background and noise. Second, images from different hybridization rounds are aligned so that the same pixel location or spot in each one represents the same transcript. Third, signals at each spot are combined into a barcode or sequence that can be used to match spots to genes, with signals that do not match any gene rejected. Finally, an optional step is segmentation, discussed below. Tools for pre-processing ISH or ISS-based transcriptomics are often targeted to a particular technique, e.g. MERFISH or seqFISH, but the recent package starfish provides a well-documented and more general-purpose pipeline [17, 72]. Like 10X Genomics’s pre-processing pipelines for data generated via their single cell and spatial transcriptomics platforms, we expect that imaging-based platforms such as MERSCOPE, Esper, and Xenium will soon receive dedicated pre-processing pipelines.
For Visium, the only commercially available array-based method, 10X Genomics has published a pre-processing pipeline—Space Ranger—that performs pre-processing with minimal user input. To convert sequencing data to spatial transcriptomic data, it accepts raw sequences of captured mRNAs and microscopy images of the profiled tissue, performs alignment of reads to the genome, matches read barcodes to spatial locations in the array, and counts the number of gene transcripts at each spatial location to produce a gene-spot matrix.
A final pre-processing step in some methods is segmentation. The aim of segmentation is to reconstruct single-cell transcriptomes from spatial data with subcellular resolution. For example, segmentation could be used with imaging-based data from methods such as MERSCOPE, Esper, or Xenium to reconstruct single-cell transcriptomes by inferring from transcript species and clustering which areas of the image likely encompassed one cell. Similar approaches can be used with subcellular array-based data, e.g. STOmics, which generates sub-micron resolution data. Thus, segmentation transforms a gene-spot matrix into an inferred gene-cell matrix. There are numerous published segmentation methods using different approaches such as manual segmentation, prior information from nuclear staining [73], deep neural networks [74], gene expression signatures from true single-cell references generated by scRNA-seq [73, 75], and some workflows such as spot-based spatial cell type analysis by multidimensional mRNA density estimation (SSAM) avoid segmentation entirely [17, 76]. Perhaps one reason for the diversity of approaches is that cell segmentation is a complex and computationally expensive process, especially when many cell types are present [73]. Where segmentation is required, we recommend approaches that can leverage prior information from nuclear staining and scRNA-seq references, such as Baysor [73]. As with other pre-processing steps, we anticipate that commercial platforms such as MERSCOPE, Esper, and Xenium will ship with built-in segmentation pipelines.
One final step in pre-processing the data is to apply statistical transformation to the gene-spot matrix to account for differences in mRNA capture rate across the tissue. This is an important step for data generated via all techniques but especially those with lower or variable capture rates such as array-based methods. In scRNA-seq, the process of accounting for differences in mRNA capture between dissociated cells is called normalization, and the same terminology is applied in spatial transcriptomics. The most common normalization procedure is to divide each cell in a gene-spot matrix by the spot total, so that every spot has the same number of counted mRNAs in the processed matrix. This approach is used by general-purpose analysis packages such as Scanpy, Giotto, and Seurat. However, this approach assumes that all regions of the tissue have the same underlying mRNA abundance, or ‘library size’, an assumption that may not be true for tissues with regions of dense nuclei juxtaposed with regions of sparse nuclei and thus lower mRNA abundance. Seurat offers the method sctransform, which normalizes not by total library size of each cell’s transcriptome but based on one group of genes at a time, with each group selected so that all the genes have similar abundances. Thus, the approach allows variation in total library size rather than enforcing it as a constant metric. This is likely to be favourable for tissues with underlying variations in mRNA abundance driven by differential cell density. Also, in contrast to standard normalization, spatial and morphological expression (SME) from the stLearn python package presents a novel method for normalization which smooths library sizes in spots or segmented cells based on the library size of nearby spots within a radius d and their morphological similarity inferred by deep learning of features from histological images while preserving larger-scale variations in mRNA abundance across the tissue [77]. Finally, one report suggests that unnormalized (‘raw’) data are also informative and preserve information about cell densities [78], but their suitability for downstream analyses is not assessed so we recommend either library size normalization or specialized techniques such as sctransform and SME depending on the library size variation across the sample.
Overall, several steps are required to convert raw image or sequencing data to processed, interpretable spatial transcriptomic data. The steps required vary between technologies, but there are tools to handle each of them, often published as a complete pre-processing pipeline such as in starfish for imaging-based data or Space Ranger for Visium data, as well as other method-specific pipelines that we anticipate will be released on instrument computers as in MERSCOPE. The final step mentioned above, normalization, is often handled by separate pipelines for downstream analysis, a broad term encompassing all analysis techniques that aim to generate or test biological hypotheses with the data. Below, we will review some common pipelines for downstream analysis and examples of analyses specific to spatial data.
Generalized toolkits for downstream analysis
There are a range of different downstream analyses for spatial data, with different aims and different inputs. Spatial data may comprise raw gene-spot matrices, normalized matrices, or accessory data such as inferred cell types and tissue domains to histological images taken before transcriptomic profiling. To provide a unified format for these data, and to simplify and standardise spatial analysis, utility packages such as Giotto, STUtility, Seurat, scanpy, stLearn, and squidpy have been developed [77, 79,80,81,82,83]. Among their shared aims are, first, to provide a structure for spatial data matrices and for associated accessory data generated through downstream analysis. The latter might include dimensionality reductions such as UMAP, unbiased clustering results, annotations, and imputation, mapping or deconvolution results. A second aim is to provide functions to complete all those processes. Third, they provide functions for data visualization, combining spatial transcriptomic data with overlays such as microscopy data [81, 83]. Finally, they provide standardized workflows for quality control (e.g. filtering poorly expressed genes), pre-processing, and specialized analysis techniques for spatial data.
For a researcher selecting an analysis package, they most obviously differ in terms of capabilities, size of user communities, and the uptake of their data formats in the larger bioinformatics community. Currently, Seurat (in R and cited > 4700 times at the time of writing) and scanpy (in python and cited > 1600 times at the time of writing) benefit from extensive documentation generated over years, from large user communities, and from many packages that recognize or even operate directly on their formats (SeuratObject and anndata, respectively). Conversely, Giotto (in R) and stLearn (in python) benefit from workflows developed specially for spatial transcriptomics and a greater variety of built-in tools for spatial downstream analyses. These include spatially variable gene identification, deconvolution, and cell-cell interaction inference, all outlined in the following section. Finally, STUtility and squidpy provide extended spatial analysis functions for Seurat and scanpy, respectively. STUtility focusses on analysis of multiple spatial transcriptomic datasets and contains features for annotating tissue regions, alignment of parallel 2-dimensional spatial datasets, and visualization of resulting 3-dimensional datasets. Squidpy likewise extends Scanpy and is from the same authors but provides a depth of functionality akin to Giotto with specialized data structures, tools for performing spatial statistics, inferring intercellular interactions, and visualizing data. Finally, STUtility, squidpy, and stLearn provide functions for analysing auxiliary image data. This can be challenging as the images are large and require significant memory to work with. STUtility works with H&E images generated alongside Visium data and transforms and aligns low resolution versions of these images before applying the transformations to higher resolution images; it uses these data for alignment of sequential sections and regional annotation. In contrast, squidpy provides a format for image data, ImageContainer, with lazy loading rather than reduced resolution to conserve memory, as well as a suite of analyses to make use of these data. Finally, stLearn provides functions for incorporating image data with gene expression data such as in the normalization step (discussed in the previous section). Overall, we recommend Seurat and Scanpy for lay biomedical researchers due to their extensive documentation and large user communities and recommend Giotto, stLearn, STUtility, and squidpy for researchers seeking more specialized spatial transcriptomics analysis pipelines.
Identification of spatial features
An initial goal for many scRNA-seq analyses is to define the cell types. An analogous step in spatial transcriptomics analysis might be the identification of spatial features such as anatomical and microanatomical structures. To identify structures, existing algorithms can group transcriptomically similar spots or cells in an unbiased way to reveal spatial patterns of gene expression. However, newer methods have been developed to specifically leverage spatial data and identify features like tissue domains. Examples include BayesSpace, a Bayesian statistical tool that uses a clustering approach, albeit with prior spatial information, to group transcriptomically similar, proximally located spots. Additionally, it can perform resolution enhancement by reassigning gene counts from whole spots in array-based data, e.g. Visium, to finer sub-spots with the use of spatial prior information from nearby spots [84]. Similarly, XFuse combines auxiliary histology (usually H&E) images with gene expression data via deep data fusion of spatial features to infer gene expression between spots in an array [85]. Other algorithms for tissue domain identification include stLearn [77], which uses SME normalization with inference from histology images via deep learning (see pre-processing methods above) to infer clusters and then subclusters where there is spatial segregation, and HMRF (hidden Markov random field) [86], which assigns spots or cells to tissue domains as a function of their gene expression, and the domain in which neighbouring cells reside, which is included in the Giotto analysis package. Overall, the choice of method will depend on the data available; for array-based methods with low resolution, BayesSpace might be favourable for resolution enhancement, but if histological images are available, then stLearn will be powerful for its ability to integrate them in its analysis of spatial transcriptomic data.
Alternatively, rather than identifying heterogeneity among cells and spots across the sample, one might search directly for genes that show biased, non-random spatial expression patterns. This can quickly elucidate anatomical features if known marker genes are detected. Numerous methods for detecting genes that vary spatially have emerged over the past few years, with some implemented in popular tools such as Seurat as the function FindSpatiallyVariableGenes which estimates spatial autocorrelation with Moran’s l over binned groups of spots rather than over individual spots for improved speed; in Giotto as BinSpect-k means or BinSpect-rank, both of which use (separate) techniques to binarize expression data and examine the correlation of a gene’s expression in one spot with that in neighbouring spots to estimate a p value; or as standalone analysis tools such as trendsceek, SPARK, and SpatialDE in python [81, 86,87,88,89]. Notably, Giotto’s methods for spatially variable gene selection provide improvements in speed over some older methods such as SpatialDE, trendsceek, and SPARK, a key concern given the continuing trend of larger datasets in spatial transcriptomics [79]. sepal is a more recent method which takes a novel approach, simulating the time taken for observed transcripts of a single species to diffuse across the sample to a random distribution, with this metric inferring the degree of spatial structure underlying the species’ distribution [90]. Binning approaches may be used to improve speed but this could result in loss of spatial detail depending on the size of the bins used.
Deconvolution
An important consideration for data generated by array methods with non-cellular resolution is that more than one cell type can contribute to each spot. This is particularly important for Visium, since its spatial resolution of 55 μm means each spot may capture many cells, the exact number depending on the tissue. The process of identifying and quantifying the relative contribution from each cell type in a capture spot is known as deconvolution. Numerous tools have been developed to perform this task, usually from an scRNA-seq reference. An early example was NMFreg, an approach that decomposed spot transcriptomes into contributions from cell types by non-negative matrix factorization, developed for deconvolution of Slide-seq (V1) data [56]. SPOTlight also utilizes non-negative matrix factorization to estimate cell type proportions [91]. This was followed by robust cell type decomposition (RCTD), which used a different statistical model to explain gene counts in each spot as a mixture of cell type contributions, unobserved platform—e.g. single-cell or spatial transcriptomics—effects, and spot-specific mRNA sampling effects, allowing it confidently assign cell types to spots much more frequently than in NMFreg (86.9% of spots vs 24.8%) [92]. Because of the similar data structures between Slide-seqV2 and Visium data, these methods are also applicable to the latter. Alternatives include stereoscope, cell2location, Tangram, and destVI [93,94,95,96]. stereoscope, like RCTD, models the composition of each spot’s transcriptome as a mixture of transcripts from different cells with additional platform-specific effects. Cell2location and destVI are both contained within the scVI analysis framework and use deep learning approaches to achieve relatively high speed, as does Tangram. destVI is unique in that compared to the other tools discussed, which deconvolute spatial data from a reference of discrete cell types, it maps continuous cell types. In effect, this allows it to map not only a known reference cell type but also variation within that cell type. Tangram also incorporates imaging data such as H&E staining during its deep learning process to first segment cells in the image and to use this as the basis for the number of cells inferred through deconvolution. Finally, utility packages such as Seurat and Giotto provide deconvolution methods [79, 81]. Giotto’s methods, PAGE and RANK, perform comparably in accuracy to RCTD [79]. Thus, to deconvolute spatial transcriptomic data not of single-cell resolution requires access to a ‘ground truth’ scRNA-seq dataset. When selecting a deconvolution technique, we suggest that users consider the run time as this step can require significant computing time and power. Recent benchmarking studies will also help users select an algorithm [97].
Imputation and mapping single cells
Some computational methods aim to combine spatial transcriptomic and scRNA-seq, not for reasons of deconvolution, but to infer where genes, while not detected, may actually have been expressed. This allows users to ‘fill in the blanks’ where spatial transcriptomic data’s targeted nature or, for some approaches, low sensitivity means a gene is not detected, a task known as imputation. Conversely, some methods take the opposite approach, using spatial datasets to infer spatial mappings for scRNA-seq-derived single-cell transcriptomes, for example in Visium where single-cell measurements cannot be made.
An early integration approach was Seurat (prior to becoming a general-purpose analysis package), which maps single-cell transcriptomes to spatial coordinates. Seurat maps cells by expression of ‘landmark genes’, inferring probabilities that a cell could have originated from a tissue location whose landmark genes match its own [98, 99]. Since then, multiple other computational approaches have been devised. More recent mapping approaches include SpaOTsc, which relies on an optimal transport model to map single-cell transcriptomes to spatial data and also includes functions for inferring ligand-receptor interaction [100]. Imputing approaches include deep learning-based gimVI which learns an alignment between scRNA-seq and spatial transcriptome data, included in the python-implemented scRNA-seq analysis package scVI; Tangram, which uses a mapping step to inform the imputation process; and spatial gene enhancement (SpaGE), which aligns spatial and scRNA-seq data by domain adaptation to inform imputation [96, 101, 102]. gimVI and Tangram are deep learning-based methods, so the choice of method may depend on whether GPU computing resources are available to researchers (they can be performed on CPUs, albeit with greater computing time). Mapping of single-cell transcriptomes will likely be useful for non-single-cell data such as unsegmented imaging-based data or data from any sequencing-based method. Conversely, imputation will be useful for inference of unmeasured genes in targeted spatial transcriptomic data.
Cell-cell interaction inference
A common goal in analysing transcriptomic data is to infer intercellular interactions based upon expression of ligands and receptors [9]. Typically, a cell-cell interaction inference tool combines a database of genes encoding proteins proven to participate in intercellular interactions, with an algorithm to infer the probability of an interaction from the gene expression data. There are now several examples of these tools for scRNA-seq data including CellPhoneDB v.2.0, iCellNet, CellChat, and SingleCellSignalR [103,104,105,106]. While effective for single-cell and some spatial data, these tools cannot leverage spatial data when it is available. Recently, several techniques have been developed for this purpose including SpaOTsc, cell2cell, MISTy, and CellPhoneDB v.3.0 and one implemented in the general-purpose spatial transcriptomics analysis package Giotto [28, 79, 100, 107,108,109]. Some of these techniques, such as SpaOTsc, can also infer differential gene expression with proximity to a signal-sending cell. More recent methods include graph neural network-based NCEM, or node-centric expression model, which takes as input segmented data from imaging-based spatial transcriptomics or proteomics and can be used to infer which cells are signal senders or receivers, as well as to infer domains in the tissue [110], and spatial variance component analysis or SVCA, which uses a Gaussian process-based framework to decompose gene expression variation across spots into intrinsic effects, environmental effects, and intercellular signalling effects [111]. We favour these recent tools for multipurpose analyses and in SVCA for its ability to model intrinsic gene expression perturbations, although users might also find built-in databases useful as in one author’s CellPhoneDB v.3.0.
A range of techniques are required to infer biological processes from spatial transcriptomics: technique-specific methods to prepare data, to identify regions corresponding to single cells and tissue regions from transcriptomes alone, to infer genes that may not have been profiled by spatial transcriptomics but instead by another modality, and to identify spatially-mediated biology such as cell-cell interaction. Here, we have reviewed examples of tools for common spatial transcriptomics analyses; while we have commented on where these tools might be useful, we anticipate that many more, possibly more advanced, tools will be published beyond the time of writing.