Hash Table/Dictionary Lookup

Usage

lookup(terms, key.match, key.reassign = NULL, missing = NA)
"lookup"(terms, key.match, key.reassign = NULL, missing = NA)
"lookup"(terms, key.match, key.reassign = NULL, missing = NA)
"lookup"(terms, key.match, key.reassign = NULL, missing = NA)
"lookup"(terms, key.match, key.reassign, missing = NA)
"lookup"(terms, key.match, key.reassign, missing = NA)
"lookup"(terms, key.match, key.reassign, missing = NA)
terms %l% key.match
terms %l+% key.match
terms %lc% key.match
terms %lc+% key.match

Arguments

terms
A vector of terms to undergo a lookup.
key.match
Takes one of the following: (1) a two column data.frame of a match key and reassignment column, (2) a named list of vectors (Note: if data.frame or named list supplied no key reassign needed) or (3) a single vector match key.
key.reassign
A single reassignment vector supplied if key.match is not a two column data.frame/named list.
missing
Value to assign to terms not matching the key.match. If set to NULL the original values in terms corresponding to the missing elements are retained.

Value

Outputs A new vector with reassigned values.

Description

lookup - data.table based hash table useful for large vector lookups.

%l% - A binary operator version of lookup for when key.match is a data.frame or named list.

%l+% - A binary operator version of lookup for when key.match is a data.frame or named list and missing is assumed to be NULL.

%lc% - A binary operator version of lookup for when key.match is a data.frame or named list and all arguments are converted to character.

%lc+% - A binary operator version of lookup for when key.match is a data.frame or named list, missing is assumed to be NULL, and all arguments are converted to character.

Examples

## Supply a dataframe to key.match lookup(1:5, data.frame(1:4, 11:14))
[1] 11 12 13 14 NA
## Retain original values for missing lookup(1:5, data.frame(1:4, 11:14), missing=NULL)
[1] 11 12 13 14 5
lookup(LETTERS[1:5], data.frame(LETTERS[1:5], 100:104))
[1] 100 101 102 103 104
lookup(LETTERS[1:5], factor(LETTERS[1:5]), 100:104)
[1] 100 101 102 103 104
## Supply a named list of vectors to key.match codes <- list( A = c(1, 2, 4), B = c(3, 5), C = 7, D = c(6, 8:10) ) lookup(1:10, codes)
[1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D"
## Supply a single vector to key.match and key.reassign lookup(mtcars$carb, sort(unique(mtcars$carb)), c("one", "two", "three", "four", "six", "eight"))
[1] "four" "four" "one" "one" "two" "one" "four" "two" "two" "four" "four" "three" "three" "three" "four" [16] "four" "four" "one" "two" "one" "one" "two" "two" "four" "two" "one" "two" "two" "four" "six" [31] "eight" "two"
lookup(mtcars$carb, sort(unique(mtcars$carb)), seq(10, 60, by=10))
[1] 40 40 10 10 20 10 40 20 20 40 40 30 30 30 40 40 40 10 20 10 10 20 20 40 20 10 20 20 40 50 60 20
## %l%, a binary operator version of lookup 1:5 %l% data.frame(1:4, 11:14)
[1] 11 12 13 14 NA
1:10 %l% codes
[1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D"
1:12 %l% codes
[1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" NA NA
1:12 %l+% codes
[1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" "11" "12"
(key <- data.frame(a=1:3, b=factor(paste0("l", 1:3))))
a b 1 1 l1 2 2 l2 3 3 l3
1:3 %l% key
[1] l1 l2 l3 Levels: l1 l2 l3
##Larger Examples key <- data.frame(x=1:2, y=c("A", "B")) big.vec <- sample(1:2, 3000000, TRUE) out <- lookup(big.vec, key) out[1:20]
[1] B B B B A A A B A A A A A B A B A A A B Levels: A B
## A big string to recode with variation ## means a bigger dictionary recode_me <- sample(1:(length(LETTERS)*10), 10000000, TRUE) ## Time it tic <- Sys.time() output <- recode_me %l% split(1:(length(LETTERS)*10), LETTERS) difftime(Sys.time(), tic)
Time difference of 0.726522 secs
## view it sample(output, 100)
[1] "K" "K" "E" "S" "M" "N" "P" "E" "L" "E" "S" "M" "U" "C" "M" "C" "Q" "S" "Z" "Y" "G" "A" "H" "A" "J" "O" "R" "K" "F" "H" [31] "F" "T" "N" "R" "L" "E" "D" "W" "H" "L" "P" "W" "Q" "Z" "M" "Z" "X" "Z" "Z" "J" "O" "L" "X" "X" "E" "T" "W" "N" "O" "H" [61] "F" "J" "P" "X" "N" "Y" "T" "G" "X" "A" "M" "N" "U" "M" "Z" "K" "S" "A" "R" "Q" "Q" "C" "Y" "P" "L" "G" "K" "C" "N" "X" [91] "Z" "T" "D" "C" "N" "B" "F" "I" "O" "K"

See also

setDT, hash