get_top_barcodes_and_cum_sum.Rd
Filter the count data to retain only the top x barcodes, where x is defined by top_threshold.
get_top_barcodes_and_cum_sum(
count_data,
top_threshold,
count_column,
grouping_col = NA
)
A data.table where each row corresponds to a clone barcode.
A numeric value indicating how many of the top clone barcodes to retain.
A character string indicating the column storing the count of the clone barcodes.
A character string, default NA. If provided, the function repeats the calculation for each subset of `count_data` defined by this column, effectively performing a group-by operation.
A filtered data.table with only the top clone barcodes.
library(data.table)
toy_clone_counts <- data.table(
sample_name = c(rep("test1", 3), rep("test2", 3)),
read_count = c(1, 5, 20, 10, 15, 12),
clone_barcodes = c("ACGT", "CATG", "CATG", "ACGT", "ATGC", "TCGT")
)
get_top_barcodes_and_cum_sum(
count_data = toy_clone_counts,
top_threshold = 2,
count_column = "read_count",
grouping_col = "sample_name"
)
#> sample_name read_count clone_barcodes cum_sum_read_count barcode_rank
#> 1: test1 20 CATG 20 1
#> 2: test1 5 CATG 25 2
#> 3: test2 15 ATGC 15 1
#> 4: test2 12 TCGT 27 2