Count how many reads are mapped to each clone/cell pair, and generate a table that resemles cell-by-clone matrix.

generate_cell_clone_barcode_matrix(
  cell_clone_bcode_dt,
  cell_bcode_col = "CellBarcode",
  clone_bcode_col = "CloneBarcode",
  umi_col = "UMI",
  umi_clone_consensus_threshold = 0.7
)

Arguments

cell_bcode_col

Name of the column in `cell_clone_reads_dt` that indicates the cell barcode for each read.

clone_bcode_col

Name of the column in `cell_clone_reads_dt` that specifies the clone barcode for each read.

umi_col

Name of the column in `cell_clone_reads_dt` that specifies the UMI barcode for each read.

umi_clone_consensus_threshold

A numeric value between 0 and 1 specifying the proportion threshold of reads for collapsing UMIs when computing the cell-by-clone matrix. See details for more information.

cell_clone_reads_dt

A data.table object representing the reads. Each row includes information about a cell, UMI, and clone barcode. For data produced by NextClone, use `fread` from the data.table package and pass the resulting object to this parameter.

Value

A data.table resembling cell-by-clone matrix.

Details

The cell-by-clone matrix construction first collapses reads with the same cell and UMI barcodes For a group of reads have the same cell barcode and UMI barcode, if the reads are mapped to several clone barcodes, by default, they are collapsed into one read and assigned to the clone barcode comprising 70% or more of its group's reads. This threshold modifiable by the `umi_clone_consensus_threshold` parameter. To apply the default threshold of 70%, set this parameter to 0.7.

Examples

library(data.table)

cell_clone_bcode_dt <- data.table(
       CellBarcode = c(
           rep("A", 2),
           rep("B", 4),
           rep("C", 5),
           rep("D", 7)
       ),
       CloneBarcode = c(
           rep("AA", 2),
           rep("AA", 3), "BB",
           rep("CC", 4), "DD",
           rep("XX", 2), rep("YY", 3), rep("ZZ", 2)
       ),
       UMI = c(
           rep("A1", 2),
           rep("B1", 2), "B2", "B3",
           rep("C1", 3), "C2", "C1",
           rep("D1", 5), rep("D2", 2)
       )
   )
generate_cell_clone_barcode_matrix(cell_clone_bcode_dt)
#>    CellBarcode CloneBarcode n_reads
#> 1:           A           AA       1
#> 2:           B           AA       2
#> 3:           B           BB       1
#> 4:           C           CC       2
#> 5:           D           ZZ       1