assign_and_embed_clones.Rd
This function assigns clone barcodes to cells and optionally embeds the assignment into a SingleCellExperiment object. It calculates a cell-by-clone matrix from a provided data.table where each row corresponds to a read associated with a cell, UMI, and clone barcode. See details for the assignment method.
assign_and_embed_clones(
cell_by_gene_mat,
cell_clone_reads_dt,
cell_bcode_col = "CellBarcode",
barcode_edit_dist_col = "BarcodeEditDist",
clone_bcode_col = "CloneBarcode",
umi_col = "UMI",
most_dominant_threshold = 0.5,
umi_clone_consensus_threshold = 0.7,
embed_to_mat = TRUE
)
A SingleCellExperiment object containing the cell-by-gene matrix. It must have a 'Barcode' column in 'colData' that uniquely identifies each cell. The barcodes should match those in the `cell_clone_reads_dt` data table.
A data.table object representing the reads. Each row includes information about a cell, UMI, and clone barcode. For data produced by NextClone, use `fread` from the data.table package and pass the resulting object to this parameter.
Name of the column in `cell_clone_reads_dt` that indicates the cell barcode for each read.
Name of the column in `cell_clone_reads_dt` that indicates the edit distance for the clone barcode of each read.
Name of the column in `cell_clone_reads_dt` that specifies the clone barcode for each read.
Name of the column in `cell_clone_reads_dt` that specifies the UMI barcode for each read.
A numeric value between 0 and 1 specifying the proportion threshold of reads that must be associated with a clone barcode to assign it to a cell in cases where multiple clone barcodes are detected.
A numeric value between 0 and 1 specifying the proportion threshold of reads for collapsing UMIs when computing the cell-by-clone matrix. See details for more information.
A boolean indicating whether to embed the clone barcode assignment directly into the `cell_by_gene_mat` object.
Depending on the `embed_to_mat` parameter, this function returns either an updated SingleCellExperiment object with clone barcode assignments embedded or a data.table with the cell-clone assignments.
Clone barcode assignment to cells follows a tiered approach: Cells with a single detected clone barcode are straightforwardly assigned to that clone. In cases where multiple clone barcodes are present, the most dominant clone barcode (constituting over 50% of reads, adjustable via `most_dominant_threshold`) is assigned. Remaining cells, where no clone barcode is sufficiently dominant, are assigned based on the lowest average barcode edit distance. If edit distances are equal, the clone barcode with the higher read count prevails. To use the default 50% threshold, set `most_dominant_threshold` to 0.5.
The cell-by-clone matrix construction first collapses reads with the same cell and UMI barcodes For a group of reads have the same cell barcode and UMI barcode, if the reads are mapped to several clone barcodes, by default, they are collapsed into one read and assigned to the clone barcode comprising 70% or more of its group's reads. This threshold modifiable by the `umi_clone_consensus_threshold` parameter. To apply the default threshold of 70%, set this parameter to 0.7.
library(scater)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
#> Loading required package: scuttle
#> Loading required package: ggplot2
library(data.table)
#>
#> Attaching package: ‘data.table’
#> The following object is masked from ‘package:SummarizedExperiment’:
#>
#> shift
#> The following object is masked from ‘package:GenomicRanges’:
#>
#> shift
#> The following object is masked from ‘package:IRanges’:
#>
#> shift
#> The following objects are masked from ‘package:S4Vectors’:
#>
#> first, second
set.seed(42)
sce <- mockSCE(ncells = 4, ngenes = 100)
colData(sce)$Barcode <- colnames(sce)
cell_clone_reads_dt <- data.table(
CellBarcode = c(
rep("Cell_001", 6),
rep("Cell_002", 7),
rep("Cell_003", 4),
rep("Cell_004", 5)
),
CloneBarcode = c(
rep("AA", 6),
rep("BB1", 4), rep("BB2", 3),
rep("CC1", 2), rep("CC2", 2),
rep("CC1", 2), "DD1", "DD2", "DD3"
),
BarcodeEditDist = c(
rep(0L, 6),
rep(0L, 7),
c(0, 1, 2, 1),
rep(0, 5)
)
)
cell_clone_reads_dt[, UMI := seq_len(nrow(cell_clone_reads_dt))]
#> CellBarcode CloneBarcode BarcodeEditDist UMI
#> 1: Cell_001 AA 0 1
#> 2: Cell_001 AA 0 2
#> 3: Cell_001 AA 0 3
#> 4: Cell_001 AA 0 4
#> 5: Cell_001 AA 0 5
#> 6: Cell_001 AA 0 6
#> 7: Cell_002 BB1 0 7
#> 8: Cell_002 BB1 0 8
#> 9: Cell_002 BB1 0 9
#> 10: Cell_002 BB1 0 10
#> 11: Cell_002 BB2 0 11
#> 12: Cell_002 BB2 0 12
#> 13: Cell_002 BB2 0 13
#> 14: Cell_003 CC1 0 14
#> 15: Cell_003 CC1 1 15
#> 16: Cell_003 CC2 2 16
#> 17: Cell_003 CC2 1 17
#> 18: Cell_004 CC1 0 18
#> 19: Cell_004 CC1 0 19
#> 20: Cell_004 DD1 0 20
#> 21: Cell_004 DD2 0 21
#> 22: Cell_004 DD3 0 22
#> CellBarcode CloneBarcode BarcodeEditDist UMI
sce_with_clone <- assign_and_embed_clones(
cell_by_gene_mat = sce,
cell_clone_reads_dt = cell_clone_reads_dt,
cell_bcode_col = "CellBarcode",
barcode_edit_dist_col = "BarcodeEditDist",
clone_bcode_col = "CloneBarcode",
umi_col = "UMI"
)
colData(sce_with_clone)
#> DataFrame with 4 rows and 6 columns
#> Mutation_Status Cell_Cycle Treatment Barcode clone_barcode
#> <character> <character> <character> <character> <character>
#> Cell_001 positive G1 treat1 Cell_001 AA
#> Cell_002 positive G0 treat2 Cell_002 BB1
#> Cell_003 negative G0 treat1 Cell_003 CC1
#> Cell_004 positive G2M treat2 Cell_004 CC1
#> clone_barcode_criteria
#> <factor>
#> Cell_001 single_clone
#> Cell_002 dominant_clone_moreThan_0_5
#> Cell_003 clone_from_edit_distance
#> Cell_004 clone_from_edit_distance