Logical array search

I have several arrays with approximately 100 possible values, i.e.:

a[0] = (a, b, c, d) a[1] = (a, e) a[2] = (d, f, g) 

I want FASTLY to return which arrays contain (a || b) && & & (d || e)

in this example, 0 and 1

I was thinking about bitwise operations ... for example, representing "abcd" on "1111"; "announcement" to "1001", etc. Then I could solve “OR” with just a bitwise OR, and then check if both are nonzero

Can anyone think of a better solution? this is not very important, as it does not seem very escalable.

Is there any DBMS that can do this quickly? I tried with mongodb, but it seems that they have not added the $ function yet and (doc says this in version 1.9.1, but I can only load 1.9.0, and this is still unstable)

I guess a “logical search” is similar to what google does all the time ... so I guess there is a better way (maybe not so fast, but more escalated) than

+7
source share
4 answers

Yes, a bitwise solution works pretty well for this. Yes, some databases include this feature, commonly called a raster column (or raster index, depending). The usual advice is to apply it to a column that has a relatively low power (i.e. a fairly small number of possible values, such as gender).

+1
source

In what sense does it not scale? 16 bytes of data in a (bit) array is not bad! I'm not sure why you need a DBMS; you can put binary data there if you need (hopefully blocks of arrays), and pull all of them to the request. Unless you plan on having billions of arrays.

For small numbers of elements, bit logic is the fastest. But if you start with much more than 100 values, then saving arrays sorted and performing binary (or even linear!) Searches will be faster. You will need to check your system to find the exact cutoff point, but if your arrays have ~ 4 elements each, I usually find a linear search faster (by counting the occurrences of the elements you are looking for in the logic, how you go) and what it hits binary math at about the same point as binary representations also gets bigger.

0
source

Store your arrays as trie, e.g.

 a b c d e d f g 

Create a trie from an expression like

 a b d e d e b d e 

You can match the last three against the first (ignoring any values ​​that are not in the expression, that is, "c", "f" and "g") to get solutions. I leave the details of the trie presentation and the matching algorithm to you.

0
source

As you said, the possible values ​​are around 100, but you have a lot of arrays, I think a hash table is better than a bit level operation (s).
For example.
have a hash table with values ​​in the expression, that is, set a, b to 1 and d, e set to 2.

 for each array a in arrays for each value v in array sum+= ht[v] if sum == 3 print found break 

(the above will not be with duplicates!)
the first cycle of the cycle can be parallelized, possibly with a reduction framework or even with an open protocol.
(btw second for can also be parallelized!)
This should be faster than creating a bit representation of the whole elements in the array and doing AND or OR. You basically win at best (for example, a and d are the first two elements!) In the worst case, both methods are the same for both methods (maybe if there is overhead for each element)

0
source

All Articles