Skip to contents

This function will convert the coordinates into a numeric vector for genes and clusters.

Usage

get_vectors(
  data_lst,
  cluster_info,
  cm_lst = NULL,
  bin_type,
  bin_param,
  all_genes,
  w_x,
  w_y,
  n_cores = 1
)

Arguments

data_lst

A list of list. Every nested list refers to one sample, which must contain at least one matrix with transcript coordinates. Optional parameter.

cluster_info

A dataframe/matrix containing the centroid coordinates, cluster label and sample for each cell.The column names must include "x" (x coordinate), "y" (y coordinate), "cluster" (cluster label) and "sample" (sample).

cm_lst

A list of named matrices containing the count matrix for each sample The name must match the sample column in cluster_info. If this input is provided, the cluster_info must be specified and contain an additional column "cell_id" to link cell location and count matrix. Default is NULL.

bin_type

A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".

bin_param

A numeric vector indicating the size of the bin. If the bin_type is "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If the bin_type is "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.

all_genes

A vector of strings giving the name of the genes you want to test. This will be used as column names for one of the result matrix gene_mt.

w_x

A numeric vector of length two specifying the x coordinate limits of enclosing box.

w_y

A numeric vector of length two specifying the y coordinate limits of enclosing box.

n_cores

A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).

Value

a list of two matrices with the following components

gene_mt

contains the transcript count in each grid. Each row refers to a grid, and each column refers to a gene.

cluster_mt

contains the number of cells in a specific cluster in each grid. Each row refers to a grid, and each column refers to a cluster.

The row order of gene_mt matches the row order of cluster_mt.

Details

This function can be used to generate input for lasso_markers by specifying all the parameters.

Suppose the input data contains \(n\) genes, \(c\) clusters, and \(k\) samples, we want to use \(a \times a\) square bin to convert the coordinates of genes and clusters into 1d vectors.

If \(k=1\), the returned list will contain one matrix for gene vectors (gene_mt) of dimension \(a^2 \times n\) and one matrix for cluster vectors (cluster_mt) of dimension \(a^2 \times c\).

If \(k>1\), gene and cluster vectors are constructed for each sample separately and concat together. There will be additional k columns on the returned cluster_mt, which is the one-hot encoding of the sample information.

Moreover, this function can vectorise genes and clusters separately based on the input. If data_lst is NULL, this function will return vectorised clusters based on cluster_info. If cluster_info is NULL, this function will return vectorised genes based on data_lst.

Examples

# simulate coordiantes for genes
trans = as.data.frame(rbind(cbind(x = c(1,2,20,21,22,23,24),
                                 y = c(23, 24, 1,2,3,4,5),
                                 feature_name="A"),
                         cbind(x = c(1,20),
                               y = c(15, 10),
                               feature_name="B"),
                         cbind(x = c(1,2,20,21,22,23,24),
                               y = c(23, 24, 1,2,3,4,5),
                               feature_name="C")))
trans$x = as.numeric(trans$x)
trans$y = as.numeric(trans$y)
clusters = data.frame(x = c(3, 5,11,21,2,23,19),
                    y = c(20, 24, 1,2,3,4,5), cluster="cluster_1")
clusters$sample="rep1"
data=list(trans_info=trans)
vecs_lst_gene = get_vectors(data_lst= list("rep1"= data),
                            cluster_info = clusters,
                            bin_type = "square",
                            bin_param = c(2,2),
                            all_genes = c("A","B","C"),
                            w_x = c(0,25), w_y=c(0,25))


# generate gene vector from count matrix
cm <- data.frame(rbind("gene_A"=c(0,0,2,0,0,0,2),
                     "gene_B"=c(5,3,3,13,0,1,14),
                     "gene_C"=c(5,0,1,5,1,0,7),
                     "gene_D"=c(0,1,1,2,0,0,2)))
colnames(cm)= paste("cell_", 1:7, sep="")
# simulate coordiantes for clusters
clusters = data.frame(x = c(1, 2,20,21,22,23,24),
            y = c(23, 24, 1,2,3,4,5), cluster="A")
clusters$sample="rep1"
clusters$cell_id= colnames(cm)
vecs_lst = get_vectors(data_lst= NULL, cluster_info = clusters,
                        cm_lst=list(rep1=cm),
                        bin_type = "square",
                        bin_param = c(2,2),
                        all_genes = row.names(cm),
                        w_x = c(0,25), w_y=c(0,25))