I have a set of keys (characters) ↔ hash (integer) in R. I would like to store these associations in one structure, which allows me to refer to a key / hash pair by key and also hash.
So something like
"hello" <-> 1234
in the db variable.
And access it (ish; this exact access syntax does not have to be):
db["hello"] -> 1234 db[1234] -> "hello"
I tried using a data frame and named keys for growths. But then I can not refer to the string for an integer number of hashes. If I use hash integers as growth names, then I cannot refer by name, etc.
My current solution is to save two dbs as two data frames. One has hashes as outlet names, the other has keys as growth names. This works, but it seems a bit inconvenient and repetitive to support two identical data frames (except for their rosers).
I would like it to be very fast in both directions :). I think it means O (log (n)) for character direction and O (1) for integer direction, but I'm not a specialist in data structure / algorithm. O (log (n)) in the integer direction is probably OK, but I think that O (n) (you need to cross the entire db solution) in any direction will weigh things too much.
DB is also bijective. That is, each key displays exactly one value, and each value displays exactly one key.
EDIT: Thanks for the posts:
By performing several tests, the matching technique is certainly slower than the data key. As Martin noted, this is due solely to the time required for matching to create the table with the key. That is, both match and keyed data.table perform a binary search to find the value. But despite this, the match is too slow for my needs when returning a single value. Therefore, I will code the solution data.table and the message.
> system.time(match(1,x)) user system elapsed 0.742 0.054 0.792 > system.time(match(1,x)) user system elapsed 0.748 0.064 0.806 > system.time(match(1e7,x)) user system elapsed 0.747 0.067 0.808 > system.time(x.table[1]) user system elapsed 0 0 0 > system.time(x.table[1e7]) user system elapsed 0.001 0.001 0.000 > system.time(x.table[1e7]) user system elapsed 0.005 0.000 0.005 > system.time(x.table[1]) user system elapsed 0.001 0.000 0.000 > system.time(x.table[1]) user system elapsed 0.020 0.001 0.038
EDIT2:
I went with fmatch solution and named vector. I liked the simplicity of the match approach, but I do repeated searches on db, so the impact on the performance of reconstructing the hash table for each matching call is too big.
fmatch has the same interface as the match, works with the same vector data type name, etc. It simply caches / remembers the created hash table, so that subsequent calls on the specified vector should only perform a hash search. All this abstracts from the caller, so fmatch is just a dropin to match.
Simple wrapper code for bidirectional search:
getChunkHashes = function(chunks, db) { return(db[fmatch(chunks, names(db))]) } getChunks = function(chunkHashes, db) { return(names(db[fmatch(chunkHashes, db)])) }