I would like to find 3 integers (ie [1 2 3]) in a large dataset with about a million points.
I am currently using MATLAB Map (hashmap), and for each point I do the following:
key = sprintf('%d ', [1 2 3]); % 23 us % key = '1 2 3 ' result = lookup_map( key ); % 32 us
This is quite a lot of time, although - 1 million points * 55 us = 55 seconds.
I would like to port this to the GPU using CUDA, but I'm not sure of a better way to approach this.
I could pass four arrays - key1, key2, key3, result , and then perform a binary search on the keys, but this will take 20 iterations (2 ^ 20 = 1048576) per key. Then I also have delays due to simultaneous access to memory from each thread.
Is there a data structure optimized for parallel (O (1), ideally) multiple key searches in CUDA?
Q: What are the boundaries of three integers? And what data is viewed?
Integer keys can be from 0 to ~ 75,000 at present, but may be more (200,000+) in the future.
For the purposes of this question, we can assume that result is an integer from 0 to the size of the data set.
Q: Why don't you pack all three numbers into one 64-bit number (21 bits per number gives you a range of 0-2,097,152). And use this to index into a sparse array?
>> A = uint64(ones(10)); >> sparse_A = sparse(A) ??? Undefined function or method 'sparse' for input arguments of type 'uint64'. >> A = int64(ones(10)); >> sparse_A = sparse(A) ??? Undefined function or method 'sparse' for input arguments of type 'int64'.
It seems that my matlab does not support sparse arrays of 64-bit numbers.
In case this helps someone else, I wrote a quick function to create a 64-bit key from three unsigned integers: 2 ^ 21:
function [key] = to_key(face) key = uint64(bitsll(face(1), 42) + bitsll(face(2), 21) + rand(face(3),1)); end
Q: From @Dennis - why not use logical indexing?
Test it out!
% Generate a million random integers between 0 and 1000 >> M = int32(floor(rand(10000000,4)*1000)); % Find a point to look for >> search = M(500000,1:3) search = 850 910 581 >> tic; idx = M(:,1)==search(1) & M(:,2)==search(2)&M(:,3)==search(3); toc; Elapsed time is 0.089801 seconds. >> M(idx,:) ans = 850 910 581 726
Unfortunately, it takes 89801us, which is 1632x less than my existing solution (55us)! It takes 2.5 hours to run it a million times!
We can try to filter M after each search:
>> tic; idx1=M(:,1)==search(1); N=M(idx1,:); idx2=N(:,2)==search(2); N2=N(idx2,:); idx3 = N2(:,3)==search(3); toc; Elapsed time is 0.038272 seconds.
It's a little faster, but still 696 times slower than using a card.
I thought about this a little more, and I decided to profile the speed of re-generating some data "on the fly" from one key search - it can be faster than searching by 3 keys, given potential problems with this approach.