Interoperability with SingleCellExperiment
Givanna Putri
Source:vignettes/interoperability_with_sce.Rmd
interoperability_with_sce.Rmd
Introduction
This vignette demonstrates how to integrate SuperCellCyto results with cytometry data stored in SingleCellExperiment (SCE) objects, and how to analyse SuperCellCyto output using Bioconductor packages that take SCE objects as input.
We use a subsampled Levine_32dim dataset stored in an SCE object to illustrate how to create supercells and conduct downstream analyses.
library(SuperCellCyto)
library(qs2)
library(scran)
library(BiocSingular)
library(scater)
library(bluster)
library(data.table)
Preparing SCE object
We first load the subsampled Levine_32dim data, stored as a qs2
using the qs_read
function.
sce <- qs_read(
system.file(
"extdata",
"Levine_32dim_sce_sub.qs2",
package = "SuperCellCyto"
)
)
sce
#> class: SingleCellExperiment
#> dim: 39 1500
#> metadata(0):
#> assays(1): counts
#> rownames(39): Time Cell_length ... file_number event_number
#> rowData names(0):
#> colnames(1500): cell_1 cell_2 ... cell_1499 cell_1500
#> colData names(3): population sample cell_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
The data is stored in the counts
assay. We will subset
it to include only markers we need to perform downstream analysis,
transform it using arcsinh transformation, and store the transformed
data in the logcounts
assay.
markers <- c(
"CD45RA", "CD133", "CD19", "CD22", "CD11b", "CD4", "CD8",
"CD34", "Flt3", "CD20", "CXCR4", "CD235ab", "CD45", "CD123", "CD321",
"CD14", "CD33", "CD47", "CD11c", "CD7", "CD15", "CD16", "CD44", "CD38",
"CD13", "CD3", "CD61", "CD117",
"CD49d", "HLA-DR", "CD64", "CD41"
)
# keep only the relevant markers
sce <- sce[markers, ]
# to store arcsinh transformed data
exprs(sce) <- asinh(counts(sce) / 5)
sce
#> class: SingleCellExperiment
#> dim: 32 1500
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(32): CD45RA CD133 ... CD64 CD41
#> rowData names(0):
#> colnames(1500): cell_1 cell_2 ... cell_1499 cell_1500
#> colData names(3): population sample cell_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
Run SuperCellCyto
SuperCellCyto requires input data in a data.table
format. Therefore, we need to extract the arcsinh-transformed data into
a data.table
object, and add the sample information and IDs
of the cells.
Note that SCE typically stores cells as columns and features as rows.
SuperCellCyto, conversely, requires cells as rows and features as
columns, a format typical for cytometry data where we typically have
more cells than features. Hence, we will transpose the extracted data
accordingly when creating the data.table
object.
dt <- data.table(t(exprs(sce)))
dt$sample <- colData(sce)$sample
dt$cell_id <- colnames(sce)
supercells <- runSuperCellCyto(
dt = dt,
markers = markers,
sample_colname = "sample",
cell_id_colname = "cell_id",
gam = 5
)
head(supercells$supercell_expression_matrix)
#> CD45RA CD133 CD19 CD22 CD11b CD4 CD8
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: 0.7567786 0.2094107 0.07620974 0.009254529 0.3511375 0.78500942 0.8687925975
#> 2: 1.1482748 0.6268151 0.40876592 0.091766210 0.2766514 0.09603202 1.3321355589
#> 3: 0.1700757 0.1339944 0.07228738 0.101802043 0.1040339 0.18761509 0.1132434207
#> 4: 0.8968968 0.2921571 2.22313786 0.450342888 0.4193913 0.23917011 0.1332567931
#> 5: 0.1476499 0.1904578 0.11589638 0.106288009 0.2489399 0.52420384 0.4813924176
#> 6: 0.3801373 0.2772991 0.68028804 0.237641653 0.2007602 0.04097948 0.0001923242
#> CD34 Flt3 CD20 CXCR4 CD235ab CD45 CD123
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: 0.11526045 0.5488327 0.041534679 1.2172962 0.6303494 5.788989 0.070171026
#> 2: 0.12708141 0.0793773 0.053358629 0.3920845 0.5637494 5.785661 0.084488379
#> 3: 3.23152402 0.1749861 0.008712508 0.5421237 0.1212147 3.416994 0.066197760
#> 4: 0.30587619 0.5787963 0.136006539 1.3302008 0.3211722 4.369223 0.698444542
#> 5: 0.05301752 0.3468969 0.092965825 0.2609136 0.5636011 5.684187 -0.003512615
#> 6: 2.13054119 0.1427986 0.005927587 0.5651623 0.1386637 3.051677 0.143230069
#> CD321 CD14 CD33 CD47 CD11c CD7 CD15
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: 1.500236 0.04154518 0.06746186 3.221072 0.2123768 2.28170702 0.2050279
#> 2: 1.656612 0.08216914 0.12675445 2.043989 0.1105668 2.43975076 0.2104722
#> 3: 2.803258 0.02104697 0.17107591 2.657387 0.3190474 0.03814356 0.2649500
#> 4: 2.654545 0.01727268 0.24770161 4.576991 0.4200163 0.09815344 0.3564437
#> 5: 1.466286 0.08933425 0.02555677 2.352106 0.1591512 2.55868425 0.1185639
#> 6: 2.040685 0.09609030 0.35279157 3.578527 0.2416795 0.02281908 0.5117449
#> CD16 CD44 CD38 CD13 CD3 CD61 CD117
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: 0.03168278 2.925953 0.6649237 0.09905437 5.10948135 0.05358984 0.281380801
#> 2: 0.10139879 4.354774 1.1550810 0.22855225 0.58614888 0.16057716 0.003634178
#> 3: 0.04358262 2.895200 0.9998574 0.31835180 0.15497706 0.05665786 0.993750340
#> 4: 0.35695241 5.021503 6.2330245 0.64945662 0.17109374 0.18993344 0.082566733
#> 5: 0.08155028 2.924965 0.3143248 0.09534336 4.89098214 0.22921411 0.059164421
#> 6: 0.06652319 2.830727 3.7236044 0.71869068 0.09991223 0.28525846 0.111576001
#> CD49d HLA-DR CD64 CD41 sample SuperCellId
#> <num> <num> <num> <num> <char> <char>
#> 1: 0.3529696 0.1407416 0.09377686 0.07614376 H1 SuperCell_1_Sample_H1
#> 2: 0.8440716 0.2467888 0.18515576 0.28544083 H1 SuperCell_2_Sample_H1
#> 3: 1.0233080 1.6318626 0.25253230 0.04277276 H1 SuperCell_3_Sample_H1
#> 4: 1.5830615 0.0820442 0.03792944 0.16956875 H1 SuperCell_4_Sample_H1
#> 5: 0.3508159 0.1748238 0.11055986 0.04522353 H1 SuperCell_5_Sample_H1
#> 6: 1.6866826 3.0233062 0.13634514 0.40852578 H1 SuperCell_6_Sample_H1
We can now embed the supercell ID in the colData
of our
SCE object.
colData(sce)$supercell_id <- factor(supercells$supercell_cell_map$SuperCellID)
head(colData(sce))
#> DataFrame with 6 rows and 4 columns
#> population sample cell_id supercell_id
#> <factor> <factor> <character> <factor>
#> cell_1 Basophils H1 cell_1 SuperCell_22_Sample_H1
#> cell_2 Basophils H1 cell_2 SuperCell_50_Sample_H1
#> cell_3 Basophils H1 cell_3 SuperCell_28_Sample_H1
#> cell_4 Basophils H1 cell_4 SuperCell_141_Sample_H1
#> cell_5 Basophils H1 cell_5 SuperCell_141_Sample_H1
#> cell_6 Basophils H1 cell_6 SuperCell_99_Sample_H1
Analyse Supercells as SCE object
As the number of supercells is less than the number of cells in our SCE object, we store the supercell expression matrix as a separate SCE object. This then allows us to use Bioconductor packages to analyse our supercells.
supercell_sce <- SingleCellExperiment(
list(logcounts = t(
supercells$supercell_expression_matrix[, markers, with = FALSE]
)),
colData = DataFrame(
SuperCellId = supercells$supercell_expression_matrix$SuperCellId,
sample = supercells$supercell_expression_matrix$sample
)
)
colnames(supercell_sce) <- colData(supercell_sce)$SuperCellId
supercell_sce
#> class: SingleCellExperiment
#> dim: 32 300
#> metadata(0):
#> assays(1): logcounts
#> rownames(32): CD45RA CD133 ... CD64 CD41
#> rowData names(0):
#> colnames(300): SuperCell_1_Sample_H1 SuperCell_2_Sample_H1 ...
#> SuperCell_149_Sample_H2 SuperCell_150_Sample_H2
#> colData names(2): SuperCellId sample
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
The code above essentially transpose the supercell expression matrix,
making supercells columns and markers rows, and store it in the
logcounts
assay of our new SCE object. We also populate the
colData
with SuperCellId and the sample name for each
supercell.
With the supercell expression matrix now in an SCE format, we can perform downstream analyses such as clustering and and drawing UMAP plots using Bioconductor packages.
set.seed(42)
supercell_sce <- fixedPCA(
supercell_sce,
rank = 10,
subset.row = NULL,
BSPARAM = RandomParam()
)
supercell_sce <- runUMAP(supercell_sce, dimred = "PCA")
clusters <- clusterCells(
supercell_sce, use.dimred = "PCA",
BLUSPARAM = SNNGraphParam(cluster.fun = "leiden")
)
colLabels(supercell_sce) <- clusters
plotReducedDim(supercell_sce, dimred = "UMAP", colour_by = "label")
Any functions which operate on SCE object should work. For example,
we can use plotExpression
in scater
package to
create violin plots of the markers against clusters.
Note, the y-axis says “logcounts”, but the data is actually arcsinh transformed, not log transformed.
plotExpression(
supercell_sce, c("CD4", "CD8", "CD19", "CD34", "CD11b"),
x = "label", colour_by = "sample"
)
Transfer information from supercells SCE object to single cell SCE object
To transfer analysis results (e.g., clusters) from the supercell SCE object back to the single-cell SCE object, we need to do some data wrangling. It is vital to ensure that the order of the analysis results (e.g., clusters) aligns with the cell order in the SCE object.
Using the cluster information as an example, we need to first extract
the colData
of the SCE objects into two separate
data.table
objects. We then use
merge.data.table
to match and merge them using the
supercell ID as the common identifiers. Make sure you set the
sort
parameter to FALSE and set x
to the
colData
of your single cell SCE object. This ensures that
the order of the resulting data.table
aligns with the order
of the colData
of our single-cell SCE object.
cell_id_sce <- data.table(as.data.frame(colData(sce)))
supercell_cluster <- data.table(as.data.frame(colData(supercell_sce)))
cell_id_sce_with_clusters <- merge.data.table(
x = cell_id_sce,
y = supercell_cluster,
by.x = "supercell_id",
by.y = "SuperCellId",
sort = FALSE
)
Finally, we can then add the cluster assignment as a column in the
colData
of our single-cell SCE object.
colData(sce)$cluster <- cell_id_sce_with_clusters$label
Visualise them as UMAP plot.
sce <- fixedPCA(sce, rank = 10, subset.row = NULL, BSPARAM = RandomParam())
sce <- runUMAP(sce, dimred = "PCA")
plotReducedDim(sce, dimred = "UMAP", colour_by = "cluster")
Or violin plot to see the distribution of their marker expressions. Note, the y-axis says “logcounts”, but the data is actually arcsinh transformed, not log transformed.
plotExpression(
sce, c("CD4", "CD8", "CD19", "CD34", "CD11b"),
x = "cluster", colour_by = "sample"
)
Session information
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] data.table_1.17.8 bluster_1.18.0
#> [3] scater_1.36.0 ggplot2_4.0.0
#> [5] BiocSingular_1.24.0 scran_1.36.0
#> [7] scuttle_1.18.0 SingleCellExperiment_1.30.1
#> [9] SummarizedExperiment_1.38.1 Biobase_2.68.0
#> [11] GenomicRanges_1.60.0 GenomeInfoDb_1.44.2
#> [13] IRanges_2.42.0 S4Vectors_0.46.0
#> [15] BiocGenerics_0.54.0 generics_0.1.4
#> [17] MatrixGenerics_1.20.0 matrixStats_1.5.0
#> [19] qs2_0.1.5 SuperCellCyto_0.99.2
#> [21] BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] gridExtra_2.3 rlang_1.1.6 magrittr_2.0.4
#> [4] compiler_4.5.1 systemfonts_1.2.3 vctrs_0.6.5
#> [7] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0
#> [10] XVector_0.48.0 labeling_0.4.3 SuperCell_1.0.1
#> [13] rmarkdown_2.29 UCSC.utils_1.4.0 ggbeeswarm_0.7.2
#> [16] ragg_1.5.0 xfun_0.53 cachem_1.1.0
#> [19] beachmat_2.24.0 jsonlite_2.0.0 DelayedArray_0.34.1
#> [22] BiocParallel_1.42.2 irlba_2.3.5.1 parallel_4.5.1
#> [25] cluster_2.1.8.1 R6_2.6.1 bslib_0.9.0
#> [28] RColorBrewer_1.1-3 limma_3.64.3 jquerylib_0.1.4
#> [31] Rcpp_1.1.0 bookdown_0.44 knitr_1.50
#> [34] FNN_1.1.4.1 Matrix_1.7-3 igraph_2.1.4
#> [37] tidyselect_1.2.1 abind_1.4-8 yaml_2.3.10
#> [40] viridis_0.6.5 stringfish_0.17.0 codetools_0.2-20
#> [43] lattice_0.22-7 tibble_3.3.0 plyr_1.8.9
#> [46] withr_3.0.2 S7_0.2.0 evaluate_1.0.5
#> [49] desc_1.4.3 RcppParallel_5.1.11-1 pillar_1.11.1
#> [52] BiocManager_1.30.26 scales_1.4.0 glue_1.8.0
#> [55] metapod_1.16.0 tools_4.5.1 BiocNeighbors_2.2.0
#> [58] ScaledMatrix_1.16.0 locfit_1.5-9.12 RANN_2.6.2
#> [61] fs_1.6.6 cowplot_1.2.0 grid_4.5.1
#> [64] edgeR_4.6.3 GenomeInfoDbData_1.2.14 beeswarm_0.4.0
#> [67] vipor_0.4.7 cli_3.6.5 rsvd_1.0.5
#> [70] textshaping_1.0.3 S4Arrays_1.8.1 viridisLite_0.4.2
#> [73] dplyr_1.1.4 uwot_0.2.3 gtable_0.3.6
#> [76] sass_0.4.10 digest_0.6.37 SparseArray_1.8.1
#> [79] ggrepel_0.9.6 dqrng_0.4.1 htmlwidgets_1.6.4
#> [82] farver_2.1.2 htmltools_0.5.8.1 pkgdown_2.1.3
#> [85] lifecycle_1.0.4 httr_1.4.7 statmod_1.5.0