This function will run permutation framework to compute a p-value for the correlation between the vectorised genes and clusters each cluster for one sample.
Usage
compute_permp(
data,
cluster_info,
perm.size,
bin_type,
bin_param,
all_genes,
correlation_method = "pearson",
n.cores = 1,
correction_method = "BH",
w_x,
w_y
)
Arguments
- data
A list of matrices containing the coordinates of transcripts.
- cluster_info
A dataframe/matrix containing the centroid coordinates and cluster label for each cell.The column names should include "x" (x coordinate), "y" (y coordinate), and "cluster" (cluster label).
- perm.size
A positive number specifying permutation times
- bin_type
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".
- bin_param
A numeric vector indicating the size of the bin. If the
bin_type
is "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If thebin_type
is "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.- all_genes
A vector of strings giving the name of the genes you want to test correlation for.
gene_mt
.- correlation_method
A parameter pass to
cor
indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.- n.cores
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).
- correction_method
A character string pass to
p.adjust
specifying the correction method for multiple testing .- w_x
a numeric vector of length two specifying the x coordinate limits of enclosing box.
- w_y
a numeric vector of length two specifying the y coordinate limits of enclosing box.
Value
A named list with the following components
obs.stat
A matrix contains the observation statistic for every gene and every cluster. Each row refers to a gene, and each column refers to a cluster
perm.arrays
A three dimensional array. The first two dimensions represent the correlation between the genes and permuted clusters. The third dimension refers to the different permutation runs.
perm.pval
A matrix contains the raw permutation p-value. Each row refers to a gene, and each column refers to a cluster
perm.pval.adj
A matrix contains the adjusted permutation p-value. Each row refers to a gene, and each column refers to a cluster
Details
To get a permutation p-value for the correlation between a gene
and a cluster, this function will permute the cluster label for
each cell randomly, and calculate correlation between the genes and
permuted clusters. This process will be repeated for perm.size
times, and permutation p-value is calculated as the probability of
permuted correlations larger than the observation correlation.
Examples
set.seed(100)
# simulate coordinates for clusters
df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5), cluster="A")
df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5), cluster="B")
clusters = rbind(df_clA, df_clB)
clusters$sample="rep1"
# simulate coordinates for genes
trans_info = data.frame(rbind(cbind(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5),
feature_name="gene_A1"),
cbind(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5),
feature_name="gene_A2"),
cbind(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5),
feature_name="gene_B1"),
cbind(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5),
feature_name="gene_B2")))
trans_info$x=as.numeric(trans_info$x)
trans_info$y=as.numeric(trans_info$y)
w_x = c(min(floor(min(trans_info$x)),
floor(min(clusters$x))),
max(ceiling(max(trans_info$x)),
ceiling(max(clusters$x))))
w_y = c(min(floor(min(trans_info$y)),
floor(min(clusters$y))),
max(ceiling(max(trans_info$y)),
ceiling(max(clusters$y))))
perm_res_lst = compute_permp(data=list("sample1"=trans_info),
cluster_info=clusters,
perm.size=100,
bin_type="square",
bin_param=c(2,2),
all_genes=unique(trans_info$feature_name),
correlation_method = "pearson",
n.cores=2,
correction_method="BH",
w_x=w_x ,
w_y=w_y)
#> Correlation Method = pearson
#> Running 100 permutation with 2 cores in parallel
perm_pvalue = perm_res_lst$perm.pval.adj