This function will convert the coordinates into a numeric vector for genes and clusters.
Usage
get_vectors(
x,
cluster_info,
sample_names,
bin_type,
bin_param,
test_genes,
w_x,
w_y,
use_cm = FALSE,
n_cores = 1
)
Arguments
- x
a SingleCellExperiment or SpatialExperiment or SpatialFeatureExperiment object
- cluster_info
A dataframe/matrix containing the centroid coordinates, cluster label and sample for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample).
- sample_names
a vector of strings giving the sample names
- bin_type
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".
- bin_param
A numeric vector indicating the size of the bin. If the
bin_type
is "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If thebin_type
is "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.- test_genes
A vector of strings giving the name of the genes you want to create gene vector. This will be used as column names for one of the result matrix
gene_mt
.- w_x
A numeric vector of length two specifying the x coordinate limits of enclosing box.
- w_y
A numeric vector of length two specifying the y coordinate limits of enclosing box.
- use_cm
A boolean value that specifies whether to create spatial vectors for genes using the count matrix and cell coordinates instead of the transcript coordinates when both types of information are available. The default setting is FALSE.
- n_cores
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).
Value
a list of two matrices with the following components
gene_mt
contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene.
cluster_mt
contains the number of cells in a specific cluster in each grid. Each row refers to a grid, and each column refers to a cluster.
The row order of gene_mt
matches the row order of cluster_mt
.
Details
This function can be used to generate input for lasso_markers
by specifying all the parameters.
Suppose the input data contains \(n\) genes, \(c\) clusters, and \(k\) samples, we want to use \(a \times a\) square bin to convert the coordinates of genes and clusters into 1d vectors.
If \(k=1\), the returned list will contain one matrix for gene vectors
(gene_mt
) of dimension \(a^2 \times n\) and one matrix for
cluster vectors (cluster_mt
) of dimension \(a^2 \times c\).
If \(k>1\), gene and cluster vectors are constructed for each sample
separately and concat together. There will be additional k columns on the
returned cluster_mt
, which is the one-hot encoding of the
sample information.
Moreover, this function can vectorise genes and clusters separately based
on the input. If x
is NULL, this function will
return vectorised clusters based on cluster_info
.
If cluster_info
is NULL, this function will return vectorised genes
based on x
.
Examples
library(SpatialExperiment)
set.seed(100)
# simulate coordinates for clusters
df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5), cluster="A")
df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5), cluster="B")
clusters = rbind(df_clA, df_clB)
clusters$sample="sample1"
# simulate coordinates for genes
trans_info = data.frame(rbind(cbind(x = rnorm(n=10, mean=20,sd=5),
y = rnorm(n=10, mean=20, sd=5),
feature_name="gene_A1"),
cbind(x = rnorm(n=10, mean=20, sd=5),
y = rnorm(n=10, mean=20, sd=5),
feature_name="gene_A2"),
cbind(x = rnorm(n=10, mean=100, sd=5),
y = rnorm(n=10, mean=100, sd=5),
feature_name="gene_B1"),
cbind(x = rnorm(n=10, mean=100, sd=5),
y = rnorm(n=10, mean=100, sd=5),
feature_name="gene_B2")))
trans_info$x=as.numeric(trans_info$x)
trans_info$y=as.numeric(trans_info$y)
trans_info$cell = sample(c("cell1","cell2","cell2"),replace=TRUE,
size=nrow(trans_info))
trans_mol <- BumpyMatrix::splitAsBumpyMatrix(
trans_info[, c("x", "y")],
row = trans_info$feature_name, col = trans_info$cell )
spe<- SpatialExperiment(
assays = list(molecules = trans_mol),sample_id ="sample1" )
w_x = c(min(floor(min(trans_info$x)),
floor(min(clusters$x))),
max(ceiling(max(trans_info$x)),
ceiling(max(clusters$x))))
w_y = c(min(floor(min(trans_info$y)),
floor(min(clusters$y))),
max(ceiling(max(trans_info$y)),
ceiling(max(clusters$y))))
vecs_lst = get_vectors(x=spe,sample_names=c("sample1"),
cluster_info = clusters,
bin_type = "square",
bin_param = c(5,5),
test_genes =c("gene_A1","gene_A2","gene_B1","gene_B2"),
w_x = w_x, w_y=w_y)