Predicts the number of cells tagged with each clone barcode for a given projection amount. This function is useful for planning single-cell experiments. It allows you to estimate the number of cells associated with specific clone barcodes and anticipate the abundance of clone barcodes (i.e., how many clone barcodes with a substantial number of cells).

project_clones(
  count_data,
  count_column,
  grouping_col = NA,
  project_amnt = c(10000),
  confidence_threshold = 1
)

Arguments

count_data

A data.table containing clone barcode count data.

count_column

A character string specifying the column storing clone barcode counts.

grouping_col

A character string, default NA. If provided, the projection is repeated for each group defined by this column.

project_amnt

A vector of numeric indicating the number of cells to project to.

confidence_threshold

A numeric value indicating the confidence level for clone barcode detection in cells. It should be between 0 (exclusive) and 1 (inclusive). This parameter allows you to specify how confident you are that you will be able to detect clone barcodes for the cells in future scRNAseq experiment. It ranges from 0 (indicating no confidence) to 1 (indicating complete confidence). For example, if you are 70% confident that all cells will have their clone barcodes detected, you should set the `confidence_threshold` to 0.7.

Value

A modified data.table with projected cell counts.

Details

How this projection is done: For each clone barcode, we compute their proportion, and then multiply the proportion by the number of cells to project to (specified by the `project_amnt` parameter). If `grouping_col` is specified, the proportion will be calculated with respect to the total number of cells in each group.

The parameter `project_amnt` allows you to specify the desired number of cells for projection. You can provide a numeric vector containing one or more values, indicating the number of cells you want to project your data onto. For example, if you want to project to 10,000 and 20,000 cells, you can create a vector `c(10000, 20000)` and pass it as the value for `project_amnt`.

Examples

library(data.table)

toy_clone_counts <- data.table(
sample_name = c(rep("test1", 3), rep("test2", 3)),
read_count = c(1, 5, 20, 10, 15, 12),
clone_barcodes = c("ACGT", "CATG", "CATG", "ACGT", "ATGC", "TCGT")
)

res <- project_clones(
    count_data = toy_clone_counts,
    project_amnt = c(100),
    count_column = "read_count",
    grouping_col = "sample_name"
)

res
#>    sample_name read_count clone_barcodes read_count_proportion
#> 1:       test1          1           ACGT            0.03846154
#> 2:       test1          5           CATG            0.19230769
#> 3:       test1         20           CATG            0.76923077
#> 4:       test2         10           ACGT            0.27027027
#> 5:       test2         15           ATGC            0.40540541
#> 6:       test2         12           TCGT            0.32432432
#>    projected_100_confidence_1
#> 1:                   3.846154
#> 2:                  19.230769
#> 3:                  76.923077
#> 4:                  27.027027
#> 5:                  40.540541
#> 6:                  32.432432