How to create supercells
Givanna Putri
how_to_supercell.RmdIntroduction
This vignette describes the steps to reduce the size of vast high-dimensional cytometry data using SuperCellCyto, an R package based on the SuperCell R package by David Gfeller lab from the University of Lausanne.
Please note that we’re still actively updating this vignette (and in fact the package itself), and that we welcome any feedbacks on how to improve them. There are myriad of ways on how to use SuperCell. While we try to cover as many use cases as possible, we bound to miss something. In that case, please reach out through the github repository by creating a Github issue.
Installation
To install SuprCellCyto, we need to use the devtools
package from CRAN. You can install devtools by using the
install.packages("devtools") command.
Thereafter, you can install SuperCellCyto using
devtools::install_github("phipsonlab/SuperCellCyto").
SuperCellCyto requires the SuperCell R package
installed to run properly. If you use the
devtools::install_github command above to install
SuperCellCyto, it should be, in theory, automatically installed. But in
the case it doesn’t, you can manually install it by using
devtools::install_github("GfellerLab/SuperCell").
Preparing your dataset
The function which creates supercells is called
runSuperCellCyto, and it operates on a
data.table object, an enhanced version of R native
data.frame. We may add some support for
SummarizedExperiment or flowFrame object in
the future if there are enough demands for it.
If the raw data is stored in a csv file, we can import it into a
data.table object using their fread
function.
If the raw data is stored across multiple csv files or FCS files
(more common for cytometry), then we will need the help of Spectre R package
to import them as adata.table object. Specifically, we need
to:
- Run
read.filesfunction to read in the FCS or csv files. - Run
do.merge.filesto merge the resultingdata.tableobjects into one.
If you are unsure as to how these steps will work out, have a look at an example in this Spectre vignette.
Using the vignette above, if you have csv files, you can run the
steps in that vignette as they are, but only
after changing the InputDirectory variable. If you have FCS
files, you need to change the file.type parameter for the
read.files function to .fcs.
For this vignette, we will simulate some toy data using the
simCytoData function.
n_markers <- 15
n_samples <- 3
dat <- SuperCellCyto::simCytoData(nmarkers = n_markers, ncells = rep(10000, n_samples))
head(dat)
#> Marker_1 Marker_2 Marker_3 Marker_4 Marker_5 Marker_6 Marker_7 Marker_8
#> 1: 9.922964 18.77430 12.37885 8.606616 19.89643 19.67688 5.309088 5.653116
#> 2: 9.660632 21.21645 10.63614 9.084591 18.40098 18.06670 5.606506 5.504536
#> 3: 8.356894 18.48199 11.33545 8.693703 19.09837 19.96394 7.031845 7.939611
#> 4: 10.295952 19.09034 10.06999 9.420504 17.96347 17.58355 7.095184 5.274024
#> 5: 9.013735 20.48332 10.90067 8.938739 18.65707 15.36199 6.107704 6.172049
#> 6: 8.235404 17.96216 11.27431 10.265208 16.65202 19.32516 4.718395 6.223309
#> Marker_9 Marker_10 Marker_11 Marker_12 Marker_13 Marker_14 Marker_15
#> 1: 9.192645 5.637332 10.78582 17.75642 8.392398 12.77173 5.807003
#> 2: 8.661358 5.074771 12.28062 19.96957 8.673914 13.08128 4.815719
#> 3: 7.024692 6.132604 10.90555 18.22023 7.009536 14.05435 5.230946
#> 4: 7.498126 6.252804 10.26942 16.71395 7.307800 14.29799 4.062768
#> 5: 9.647637 4.944890 11.49003 16.60555 8.057238 15.34386 6.582094
#> 6: 9.875673 5.632924 10.94819 16.50517 8.445060 15.63214 5.950597
#> Sample Cell_Id
#> 1: Sample_1 Cell_1
#> 2: Sample_1 Cell_2
#> 3: Sample_1 Cell_3
#> 4: Sample_1 Cell_4
#> 5: Sample_1 Cell_5
#> 6: Sample_1 Cell_6There are several things to note about our dataset. Let’s go through them one by one in each sub-section below.
The markers
The runSuperCellCyto function does not perform any data
transformation or scaling. Thus, we must ensure that our dataset have
already been appropriately transformed using either the arc-sinh
transformation or linear binning (using FlowJo). This tutorial explains
the data transformation process in very great detail: (https://wiki.centenary.org.au/display/SPECTRE/Data+transformation).
Please have a read if you are unsure how to transform your data.
For our toy dataset, we will transform our data using the arc-sinh
transformation implementation provided by the base R asinh
function:
# Specify which columns are the markers to transform
marker_cols <- paste0("Marker_", seq_len(n_markers))
# The co-factor for arc-sinh
cofactor <- 5
# Do the transformation
dat_asinh <- asinh(dat[, marker_cols, with = FALSE] / cofactor)
# Rename the new columns
marker_cols_asinh <- paste0(marker_cols, "_asinh")
names(dat_asinh) <- marker_cols_asinh
# Add them our previously loaded data
dat <- cbind(dat, dat_asinh)
head(dat[, marker_cols_asinh, with = FALSE])
#> Marker_1_asinh Marker_2_asinh Marker_3_asinh Marker_4_asinh Marker_5_asinh
#> 1: 1.436724 2.033476 1.638194 1.311582 2.089677
#> 2: 1.412863 2.152090 1.499128 1.358628 2.014081
#> 3: 1.286217 2.018321 1.557077 1.320298 2.050023
#> 4: 1.469797 2.049616 1.449878 1.390569 1.990879
#> 5: 1.351774 2.117894 1.521409 1.344475 2.027425
#> 6: 1.273675 1.990808 1.552131 1.467108 1.918055
#> Marker_6_asinh Marker_7_asinh Marker_8_asinh Marker_9_asinh Marker_10_asinh
#> 1: 2.078919 0.9244168 0.9707903 1.369000 0.9686972
#> 2: 1.996400 0.9646000 0.9509570 1.317068 0.8919084
#> 3: 2.092962 1.1416784 1.2425647 1.140849 1.0328413
#> 4: 1.970301 1.1489973 0.9196004 1.194555 1.0479433
#> 5: 1.841095 1.0296905 1.0378167 1.411668 0.8735583
#> 6: 2.061447 0.8409828 1.0442539 1.432460 0.9681120
#> Marker_11_asinh Marker_12_asinh Marker_13_asinh Marker_14_asinh
#> 1: 1.511791 1.979715 1.289858 1.667228
#> 2: 1.630811 2.093235 1.318323 1.689563
#> 3: 1.521816 2.004558 1.139090 1.756879
#> 4: 1.467477 1.921610 1.173250 1.773087
#> 5: 1.569483 1.915378 1.255035 1.839972
#> 6: 1.525364 1.909574 1.295236 1.857686
#> Marker_15_asinh
#> 1: 0.9910259
#> 2: 0.8550708
#> 3: 0.9136600
#> 4: 0.7424410
#> 5: 1.0884223
#> 6: 1.0096323Breaking down the steps, we:
- Identify the columns denoting the markers.
- Set the co-factor to 5.
- Do the transformation and store it in
dat_asinhvariable. - Set the
dat_asinhcolumn name to reflect that the values in each column (marker) haas undergone an arc-sinh transformation. - Combine
datanddat_asinhusingcbind.
Cell id column
To create supercell, we must provide a column which uniquely identify
each cell, akin to the Cell_Id column in the toy data we
generated above:
head(dat$Cell_Id, n = 10)
#> [1] "Cell_1" "Cell_2" "Cell_3" "Cell_4" "Cell_5" "Cell_6" "Cell_7"
#> [8] "Cell_8" "Cell_9" "Cell_10"The purpose of cell id is to allow SuperCell to uniquely identify each cell in the dataset. This ID will come in super handy later when/if we need to work out which cells belong to which supercells.
Generally, we will need to create this ID ourselves. Most dataset
won’t come with this ID already embedded in. A simple cell id can be
made up by concatenating the word Cell with the row number.
Something like the following:
dat$Cell_id_dummy <- paste0("Cell_", seq_len(nrow(dat)))
head(dat$Cell_id_dummy, n = 10)
#> [1] "Cell_1" "Cell_2" "Cell_3" "Cell_4" "Cell_5" "Cell_6" "Cell_7"
#> [8] "Cell_8" "Cell_9" "Cell_10"Here, we store the cell id in a column called
Cell_id_dummy. It has values such as
Cell_1, Cell_2, all the way until Cell_x where
x is the number of cells in the dataset.
Note, we can name the cell id column however we like,
id, cell_identity, etc. Importantly, we need make sure we
note the column name as we will need to pass it to the
runSuperCellCyto function later.
Sample column
You will notice that in the toy data above, we have a column called
Sample. By default, this column refers to the biological
sample the cells come from. In the toy data above, we have 3 samples,
Sample_1, Sample_2, Sample_3:
unique(dat$Sample)
#> [1] "Sample_1" "Sample_2" "Sample_3"and we have 10,000 cells per sample:
table(dat$Sample)
#>
#> Sample_1 Sample_2 Sample_3
#> 10000 10000 10000To create supercells, it is necessary to have this
Sample column in our dataset. We can name the column
however we like, Samp, Cell_Samp. However, we make sure we
note the column name as we will need to pass it to the
runSuperCellCyto function. More on this in [Creating
supercells][#creating-supercells] section.
But what if we only have 1 biological sample in our dataset? It does
not matter. We still need to have this column in our dataset, and pass
the column name to the runSuperCellCyto function. The only
difference is that this column will only have 1 unique value.
Why do we need to do this? To ensure that each supercell only
contains cells from exactly 1 sample. This is because, in general, it
does not make sense to mix cells from different biological samples in
one supercell. Additionally (not as important), the
runSuperCellCyto function can process all the samples in
parallel if you set its BPPARAM parameter to a
BiocParallelParam class that leverage parallel processing.
More on this in Running
runSuperCellCyto in parallel section below.
However, if you want each supercell to contain cells from different
biological samples, then you need to create a new Sample
column containing exactly 1 unique value, and
pass the column name to runSuperCellCyto function.
: You may wonder whether it is possible to use
SuperCellCyto to reduce the number of cells captured in
each cluster (or cell type) so we can make a UMAP/tSNE plot that is not
as crowded? Commonly in cytometry, we use stratified sampling to
subsample our clusters before drawing UMAP/tSNE plot to avoid
overcrowding it.
The short answer is, yes you can. See Using runSuperCellCyto for stratified summarising section for more information.
Creating supercells
Now that we have imported our data, let’s create some supercells.
First, let’s store the markers, sample, and cell id column in variables:
markers_col <- paste0("Marker_", seq_len(n_markers), "_asinh")
sample_col <- "Sample"
cell_id_col <- "Cell_Id_dummy"Then pass all of that, together with the dataset into
runSuperCellCyto function to create supercells:
supercells <- runSuperCellCyto(
dt = dat,
markers = markers_col,
sample_colname = sample_col,
cell_id_colname = cell_id_col
)
#> Warning in SCimplify(X = mt, genes.use = rownames(mt), do.scale = FALSE, : colnames(X) is Null,
#> Gene expression matrix X is expected to have cellIDs as colnames!
#> CellIDs will be created automatically in a form 'cell_i'
#> Warning in SCimplify(X = mt, genes.use = rownames(mt), do.scale = FALSE, : colnames(X) is Null,
#> Gene expression matrix X is expected to have cellIDs as colnames!
#> CellIDs will be created automatically in a form 'cell_i'
#> Warning in SCimplify(X = mt, genes.use = rownames(mt), do.scale = FALSE, : colnames(X) is Null,
#> Gene expression matrix X is expected to have cellIDs as colnames!
#> CellIDs will be created automatically in a form 'cell_i'Now let’s dig deeper into the object it created:
class(supercells)
#> [1] "list"It is a list containing 3 elements:
names(supercells)
#> [1] "supercell_expression_matrix" "supercell_cell_map"
#> [3] "supercell_object"Supercell object
The supercell_object contains the metadata used to
create the supercells. It is a list, and each element contains the
metadata used to create the supercells for a sample. This will come in
handy if we need to debug the supercells later down the line.
Supercell expression matrix
The supercell_expression_matrix contains the marker
expression of each supercell. These are calculated by taking the average
of the marker expression of all the cells contained within a
supercell.
head(supercells$supercell_expression_matrix)
#> Marker_1_asinh Marker_2_asinh Marker_3_asinh Marker_4_asinh Marker_5_asinh
#> 1: 1.469085 2.072055 1.567400 1.377552 2.027787
#> 2: 1.263866 2.047135 1.581732 1.418801 2.034719
#> 3: 1.274929 2.047144 1.573172 1.421295 2.026800
#> 4: 1.202416 2.066525 1.689606 1.356282 2.027473
#> 5: 1.286750 2.048629 1.601158 1.296497 2.019125
#> 6: 1.326727 2.056175 1.582990 1.324014 2.031440
#> Marker_6_asinh Marker_7_asinh Marker_8_asinh Marker_9_asinh Marker_10_asinh
#> 1: 2.041505 0.9584494 1.2089978 1.186689 0.8284389
#> 2: 2.034187 1.1049543 1.0451931 1.375561 1.0387258
#> 3: 2.027067 1.2336162 0.9864861 1.383786 1.0083728
#> 4: 2.030019 1.0997869 0.9987753 1.312165 1.0554807
#> 5: 2.031943 1.0839941 1.1662184 1.260295 0.9705766
#> 6: 2.018531 1.0085722 1.0633266 1.272978 1.1422201
#> Marker_11_asinh Marker_12_asinh Marker_13_asinh Marker_14_asinh
#> 1: 1.615054 1.980432 1.241044 1.796577
#> 2: 1.558310 1.967110 1.323674 1.779298
#> 3: 1.633094 1.970961 1.418573 1.827996
#> 4: 1.543092 1.994770 1.272785 1.812418
#> 5: 1.564709 1.947356 1.335419 1.799435
#> 6: 1.597838 1.966346 1.304000 1.804822
#> Marker_15_asinh Sample SuperCellId
#> 1: 1.0170310 Sample_1 SuperCell_1_Sample_Sample_1
#> 2: 0.7927207 Sample_1 SuperCell_2_Sample_Sample_1
#> 3: 0.9798060 Sample_1 SuperCell_3_Sample_Sample_1
#> 4: 0.8796199 Sample_1 SuperCell_4_Sample_Sample_1
#> 5: 1.1062859 Sample_1 SuperCell_5_Sample_Sample_1
#> 6: 0.8164972 Sample_1 SuperCell_6_Sample_Sample_1Therein, we will have the following columns:
names(supercells$supercell_expression_matrix)
#> [1] "Marker_1_asinh" "Marker_2_asinh" "Marker_3_asinh" "Marker_4_asinh"
#> [5] "Marker_5_asinh" "Marker_6_asinh" "Marker_7_asinh" "Marker_8_asinh"
#> [9] "Marker_9_asinh" "Marker_10_asinh" "Marker_11_asinh" "Marker_12_asinh"
#> [13] "Marker_13_asinh" "Marker_14_asinh" "Marker_15_asinh" "Sample"
#> [17] "SuperCellId"- All the markers we previously specified in the
markers_colvariable. - A column (
Samplein this case) denoting which sample a supercell belongs to, (note the column name is the same as what is stored insample_colvariable). - The
SuperCellIdcolumn denoting the unique ID of the supercell.
SuperCellId
Let’s have a look at SuperCellId:
head(unique(supercells$supercell_expression_matrix$SuperCellId))
#> [1] "SuperCell_1_Sample_Sample_1" "SuperCell_2_Sample_Sample_1"
#> [3] "SuperCell_3_Sample_Sample_1" "SuperCell_4_Sample_Sample_1"
#> [5] "SuperCell_5_Sample_Sample_1" "SuperCell_6_Sample_Sample_1"Let’s break down one of them,
SuperCell_1_Sample_Sample_1. SuperCell_1 is a
numbering (1 to however many supercells there are in a sample) used to
uniquely identify each supercell in a sample. Notably, you may encounter
this (SuperCell_1, SuperCell_2) being repeated
across different samples, e.g.,
supercell_ids <- unique(supercells$supercell_expression_matrix$SuperCellId)
supercell_ids[grep("SuperCell_1_", supercell_ids)]
#> [1] "SuperCell_1_Sample_Sample_1" "SuperCell_1_Sample_Sample_2"
#> [3] "SuperCell_1_Sample_Sample_3"While these 3 supercells’ id are pre-fixed with
SuperCell_1, it does not make them equal to one another!
SuperCell_1_Sample_Sample_1 will only contain cells from
Sample_1 while SuperCell_1_Sample_Sample_2
will only contain cells from Sample_2.
By now, you may have noticed that we appended the sample name into each supercell id. This aids in differentiating the supercells in different samples.
Supercell cell map
supercell_cell_map maps each cell in our dataset to the
supercell it belongs to.
head(supercells$supercell_cell_map)
#> SuperCellID Sample
#> 1: SuperCell_285_Sample_Sample_1 Sample_1
#> 2: SuperCell_61_Sample_Sample_1 Sample_1
#> 3: SuperCell_170_Sample_Sample_1 Sample_1
#> 4: SuperCell_217_Sample_Sample_1 Sample_1
#> 5: SuperCell_308_Sample_Sample_1 Sample_1
#> 6: SuperCell_129_Sample_Sample_1 Sample_1This map is very useful if we later need to expand the supercells out. Additionally, this is also the reason why we need to have a column in the dataset which uniquely identify each cell.
Running runSuperCellCyto in parallel
By default, runSuperCellCyto will process each sample
one after the other. As each sample is processed independent of one
another, we can process all of them in parallel.
To do this, we need to create a BiocParallelParam object
that leverages parallel processing. Additionally, we will also set the
number of tasks to the number of samples, and set the
load_balancing parameter to TRUE so jobs that are
supercelling large samples are not assigned small samples (they will
instead be given to those that are supercelling smaller samples).
Notably, we should not set more workers than the total number of cores we have in the computer, as it will render your computer useless for anything else (and it might blow out your RAM). To find out the total number of cores we have in the computer, we can use parallel’s detectCores.
n_cores <- detectCores()
supercell_par <- runSuperCellCyto(
dt = dat,
markers = markers_col,
sample_colname = sample_col,
cell_id_colname = cell_id_col,
BPPARAM = MulticoreParam(
workers = n_cores - 1,
tasks = n_samples
),
load_balancing = TRUE
)Controlling the supercells’ granularity
This is described in the runSuperCellCyto function’s
documentation, but let’s briefly go through it here.
The runSuperCellCyto function is equipped with various
parameters which can be customise to alter the composition of the
supercells. The one is very likely to be used the most is the
gam parameter.
The gam parameter controls how many supercells to
generate, and indirectly, how many cells are captured within each
supercell. This parameter is resolved into the following formula
gam=n_cells/n_supercells where n_cell denotes
the number of cells and n_supercells denotes the number of
supercells.
In general, the larger gam parameter is set to, the less
supercells we will get. Say for instance we have 10,000 cells. If
gam is set to 10, we will end up with about 1,000
supercells, whereas if gam is set to 50, we will end up
with about 200 supercells.
You may have noticed, after reading the sections above,
runSuperCellCyto is ran on each sample independent of each
other, and that we can only set 1 value as the gam
parameter. Indeed, for now, the same gam value will be used
across all samples, and that depending on how many cells we have in each
sample, we will end up with different number of supercells for each
sample. For instance, say we have 10,000 cells for sample 1, and 100,000
cells for sample 2. If gam is set to 10, for sample 1, we
will get 1,000 supercells (10,000/10) while for sample 2, we will get
10,000 supercells (100,000/10).
In the future, we may add the ability to specify different
gam value for different samples. For now, if we want to do
this, we will need to break down our data into multiple
data.table objects, each containing data from 1 sample, and
run runSuperCellCyto function on each of them with
different gam parameter value. Something like the
following:
n_markers <- 10
dat <- simCytoData(nmarkers = n_markers)
markers_col <- paste0("Marker_", seq_len(n_markers))
sample_col <- "Sample"
cell_id_col <- "Cell_Id"
samples <- unique(dat[[sample_col]])
gam_values <- c(10, 20, 10)
supercells_diff_gam <- lapply(seq_len(length(samples)), function(i) {
sample <- samples[i]
gam <- gam_values[i]
dat_samp <- dat[dat$Sample == sample, ]
supercell_samp <- runSuperCellCyto(
dt = dat_samp,
markers = markers_col,
sample_colname = sample_col,
cell_id_colname = cell_id_col,
gam = gam
)
return(supercell_samp)
})Subsequently, to extract and combine the
supercell_expression_matrix and
supercell_cell_map, we will need to use
rbind:
supercell_expression_matrix <- do.call(
"rbind", lapply(supercells_diff_gam, function(x) x[["supercell_expression_matrix"]])
)
supercell_cell_map <- do.call(
"rbind", lapply(supercells_diff_gam, function(x) x[["supercell_cell_map"]])
)
rbind(head(supercell_expression_matrix, n = 3), tail(supercell_expression_matrix, n = 3))
#> Marker_1 Marker_2 Marker_3 Marker_4 Marker_5 Marker_6 Marker_7 Marker_8
#> 1: 4.688545 12.96876 13.56623 12.56293 17.266424 16.03250 6.786881 18.118704
#> 2: 5.494283 12.34399 13.92597 13.89454 15.735730 15.24414 8.698235 15.753595
#> 3: 6.100729 12.42840 14.81436 13.88304 16.577030 16.12862 7.456026 16.464623
#> 4: 17.734554 17.88825 16.16063 16.21216 8.169350 13.78371 13.510645 8.174193
#> 5: 17.309954 15.25007 15.96462 17.61992 6.750592 15.52790 11.322850 8.925017
#> 6: 16.193427 16.24684 17.28793 16.24450 6.350078 16.41798 11.073143 8.655675
#> Marker_9 Marker_10 Sample SuperCellId
#> 1: 16.00083 14.682485 Sample_1 SuperCell_1_Sample_Sample_1
#> 2: 16.12070 14.505695 Sample_1 SuperCell_2_Sample_Sample_1
#> 3: 15.55512 15.843026 Sample_1 SuperCell_3_Sample_Sample_1
#> 4: 13.73575 8.234455 Sample_2 SuperCell_498_Sample_Sample_2
#> 5: 15.40221 8.963820 Sample_2 SuperCell_499_Sample_Sample_2
#> 6: 16.04450 8.768232 Sample_2 SuperCell_500_Sample_Sample_2
rbind(head(supercell_cell_map, n = 3), tail(supercell_cell_map, n = 3))
#> SuperCellID CellId Sample
#> 1: SuperCell_857_Sample_Sample_1 Cell_1 Sample_1
#> 2: SuperCell_249_Sample_Sample_1 Cell_2 Sample_1
#> 3: SuperCell_465_Sample_Sample_1 Cell_3 Sample_1
#> 4: SuperCell_2_Sample_Sample_2 Cell_19998 Sample_2
#> 5: SuperCell_30_Sample_Sample_2 Cell_19999 Sample_2
#> 6: SuperCell_257_Sample_Sample_2 Cell_20000 Sample_2Using runSuperCellCyto for stratified summarising
As previously mentioned, we can use runSuperCellCyto to
perform stratified summarising, i.e., to summarise (well, meaningfully
sub-sample) each cluster or cell type. To do this, we need to change the
sample column such that it denotes the cell type or the cluster a cell
belongs to.
As an example, let’s first cluster a toy data with k-means:
set.seed(42)
# Simulate some data
dat <- simCytoData()
markers_col <- paste0("Marker_", seq_len(10))
cell_id_col <- "Cell_Id"
# Run kmeans
clust <- kmeans(
x = dat[, markers_col, with = FALSE],
centers = 5
)
clust_col <- "kmeans_clusters"
dat[[clust_col]] <- paste0("cluster_", clust$cluster)To perform stratified summarising, we supply the cluster column
(kmeans_clusters in the example above), as
runSuperCellCyto’s sample_colname
parameter.
supercells <- runSuperCellCyto(
dt = dat,
markers = markers_col,
sample_colname = clust_col,
cell_id_colname = cell_id_col
)Now, if we look at the supercell_expression_matrix, each
row (each supercell) will be denoted with the cluster it belongs to, and
not the biological sample it came from:
# Inspect the top 3 and bottom 3 of the expression matrix and some columns.
rbind(
head(supercells$supercell_expression_matrix, n = 3),
tail(supercells$supercell_expression_matrix, n = 3)
)[, c("kmeans_clusters", "SuperCellId", "Marker_10")]
#> kmeans_clusters SuperCellId Marker_10
#> 1: cluster_4 SuperCell_1_Sample_cluster_4 14.64662
#> 2: cluster_4 SuperCell_2_Sample_cluster_4 14.66858
#> 3: cluster_4 SuperCell_3_Sample_cluster_4 14.41837
#> 4: cluster_5 SuperCell_498_Sample_cluster_5 16.99003
#> 5: cluster_5 SuperCell_499_Sample_cluster_5 17.09864
#> 6: cluster_5 SuperCell_500_Sample_cluster_5 15.85447If we look at the number of supercells created and check how many
cells there were in each cluster, we will find that, for each cluster,
we get approximately n_cells/20 where 20 is the
gam parameter value we used for
runSuperCellCyto (this is the default).
# Compute how many cells per cluster, and divide by 20, the gamma value.
table(dat$kmeans_clusters) / 20
#>
#> cluster_1 cluster_2 cluster_3 cluster_4 cluster_5
#> 120.25 130.30 119.75 129.70 500.00
table(supercells$supercell_expression_matrix$kmeans_clusters)
#>
#> cluster_1 cluster_2 cluster_3 cluster_4 cluster_5
#> 120 130 120 130 500Session information
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BiocParallel_1.36.0 SuperCellCyto_0.1.0 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.6-1.1 jsonlite_1.8.8 compiler_4.3.2
#> [4] BiocManager_1.30.22 Rcpp_1.0.12 stringr_1.5.1
#> [7] SuperCell_1.0 jquerylib_0.1.4 systemfonts_1.0.5
#> [10] textshaping_0.3.7 yaml_2.3.8 fastmap_1.1.1
#> [13] lattice_0.21-9 plyr_1.8.9 R6_2.5.1
#> [16] igraph_1.6.0 knitr_1.45 bookdown_0.37
#> [19] desc_1.4.3 bslib_0.6.1 rlang_1.1.3
#> [22] cachem_1.0.8 stringi_1.8.3 RANN_2.6.1
#> [25] xfun_0.41 fs_1.6.3 sass_0.4.8
#> [28] memoise_2.0.1 cli_3.6.2 pkgdown_2.0.7
#> [31] magrittr_2.0.3 digest_0.6.34 grid_4.3.2
#> [34] lifecycle_1.0.4 vctrs_0.6.5 evaluate_0.23
#> [37] glue_1.7.0 data.table_1.14.10 codetools_0.2-19
#> [40] ragg_1.2.7 rmarkdown_2.25 purrr_1.0.2
#> [43] pkgconfig_2.0.3 tools_4.3.2 htmltools_0.5.7