This function will convert the coordinates into a numeric vector for genes and clusters.
Usage
get_vectors(
trans_lst,
cluster_info,
cm_lst = NULL,
bin_type,
bin_param,
all_genes,
w_x,
w_y,
n_cores = 1
)
Arguments
- trans_lst
A list of list. Every nested list refers to one sample, which must contain at least one matrix with transcript coordinates. Optional parameter.
- cluster_info
A dataframe/matrix containing the centroid coordinates, cluster label and sample for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample).
- cm_lst
A list of named matrices containing the count matrix for each sample The name must match the sample column in
cluster_info
. If this input is provided, thecluster_info
must be specified and contain an additional column "cell_id" to link cell location and count matrix. Default is NULL.- bin_type
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".
- bin_param
A numeric vector indicating the size of the bin. If the
bin_type
is "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If thebin_type
is "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.- all_genes
A vector of strings giving the name of the genes you want to test. This will be used as column names for one of the result matrix
gene_mt
.- w_x
A numeric vector of length two specifying the x coordinate limits of enclosing box.
- w_y
A numeric vector of length two specifying the y coordinate limits of enclosing box.
- n_cores
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).
Value
a list of two matrices with the following components
gene_mt
contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene.
cluster_mt
contains the number of cells in a specific cluster in each grid. Each row refers to a grid, and each column refers to a cluster.
The row order of gene_mt
matches the row order of cluster_mt
.
Details
This function can be used to generate input for lasso_markers
by specifying all the parameters.
Suppose the input data contains \(n\) genes, \(c\) clusters, and \(k\) samples, we want to use \(a \times a\) square bin to convert the coordinates of genes and clusters into 1d vectors.
If \(k=1\), the returned list will contain one matrix for gene vectors
(gene_mt
) of dimension \(a^2 \times n\) and one matrix for
cluster vectors (cluster_mt
) of dimension \(a^2 \times c\).
If \(k>1\), gene and cluster vectors are constructed for each sample
separately and concat together. There will be additional k columns on the
returned cluster_mt
, which is the one-hot encoding of the
sample information.
Moreover, this function can vectorise genes and clusters separately based
on the input. If trans_lst
is NULL, this function will
return vectorised clusters based on cluster_info
.
If cluster_info
is NULL, this function will return vectorised genes
based on trans_lst
.
Examples
# simulate coordinates for genes
trans = as.data.frame(rbind(cbind(x = c(1,2,20,21,22,23,24),
y = c(23, 24, 1,2,3,4,5),
feature_name="A"),
cbind(x = c(1,20),
y = c(15, 10),
feature_name="B"),
cbind(x = c(1,2,20,21,22,23,24),
y = c(23, 24, 1,2,3,4,5),
feature_name="C")))
trans$x = as.numeric(trans$x)
trans$y = as.numeric(trans$y)
clusters = data.frame(x = c(3, 5,11,21,2,23,19),
y = c(20, 24, 1,2,3,4,5), cluster="cluster_1")
clusters$sample="rep1"
vecs_lst_gene = get_vectors(trans_lst= list("rep1"= trans),
cluster_info = clusters,
bin_type = "square",
bin_param = c(2,2),
all_genes = c("A","B","C"),
w_x = c(0,25), w_y=c(0,25))
# generate gene vector from count matrix
cm <- data.frame(rbind("gene_A"=c(0,0,2,0,0,0,2),
"gene_B"=c(5,3,3,13,0,1,14),
"gene_C"=c(5,0,1,5,1,0,7),
"gene_D"=c(0,1,1,2,0,0,2)))
colnames(cm)= paste("cell_", 1:7, sep="")
# simulate coordiantes for clusters
clusters = data.frame(x = c(1, 2,20,21,22,23,24),
y = c(23, 24, 1,2,3,4,5), cluster="A")
clusters$sample="rep1"
clusters$cell_id= colnames(cm)
vecs_lst = get_vectors(trans_lst= NULL, cluster_info = clusters,
cm_lst=list(rep1=cm),
bin_type = "square",
bin_param = c(2,2),
all_genes = row.names(cm),
w_x = c(0,25), w_y=c(0,25))