This function extends the functionality of %in% for
finding which rows in the first argument exist in the second.
Details
As this function is intended to be used for data frames containing more than
just the barcodes, the intersection of the column names is used for matching.
As opposed to base::match(), this function is implemented more efficiently
by converting each row into a numeric encoding before matching.
For technical reasons,
it is not permitted for the product of the number of the
unique values of the columns in table to
exceed \(2^{32}-1\approx 2.1\cdot 10^{9}\).
See also
create_freq_table() for how frequency tables are constructed,
combinatorial_demultiplex() for more information on the matrix of assigned
barcodes,
and dplyr::inner_join() for a function with similar functionality.
Examples
barcode_table <- data.frame(
read = c("seq_1", "seq_2", "seq_3", "seq_4"),
bc1 = c("A", "B", "C", "B"),
bc2 = c("A", "C", "A", "A")
)
freq_table <- data.frame(
bc1 = c("B", "B", "C", "A"),
bc2 = c("A", "C", "A", "A"),
frequency = c(200L, 100L, 50L, 10L)
)
freq_cutoff <- 100L
selected_freq_table <- freq_table[freq_table$frequency >= freq_cutoff, ]
selected_rows <- row_match(barcode_table, selected_freq_table)
selected_barcode_table <- barcode_table[selected_rows, ]