Filter the count data to retain only the top x barcodes, where x is defined by top_threshold.

get_top_barcodes_and_cum_sum(
  count_data,
  top_threshold,
  count_column,
  grouping_col = NA
)

Arguments

count_data

A data.table where each row corresponds to a clone barcode.

top_threshold

A numeric value indicating how many of the top clone barcodes to retain.

count_column

A character string indicating the column storing the count of the clone barcodes.

grouping_col

A character string, default NA. If provided, the function repeats the calculation for each subset of `count_data` defined by this column, effectively performing a group-by operation.

Value

A filtered data.table with only the top clone barcodes.

Examples

library(data.table)

toy_clone_counts <- data.table(
sample_name = c(rep("test1", 3), rep("test2", 3)),
read_count = c(1, 5, 20, 10, 15, 12),
clone_barcodes = c("ACGT", "CATG", "CATG", "ACGT", "ATGC", "TCGT")
)

get_top_barcodes_and_cum_sum(
  count_data = toy_clone_counts,
  top_threshold = 2,
  count_column = "read_count",
  grouping_col = "sample_name"
)
#>    sample_name read_count clone_barcodes cum_sum_read_count barcode_rank
#> 1:       test1         20           CATG                 20            1
#> 2:       test1          5           CATG                 25            2
#> 3:       test2         15           ATGC                 15            1
#> 4:       test2         12           TCGT                 27            2