Regarding the second problem:
You can watch Bloom Filters . Bloom Filters are specifically designed to answer the question about membership in O (1), although the answer is either no or maybe (which is not as transparent as yes / no: p).
In the case of maybe , of course, you need further processing to actually answer the question (if in your case there is no probabilistic answer), but even then Bloom Filter can act as a gatekeeper and reject most requests directly.
In addition, you may want to keep the actual ranges and degenerate ranges (individual elements) in different structures.
- individual items can be better stored in a hash table
- valid ranges can be stored in a sorted array
This reduces the number of elements stored in the sorted array, and therefore the complexity of the binary search performed there. Since you claim that many ranges are degenerate, I assume that you only have 500-1000 ranges (i.e., an order of magnitude smaller) and log (1000) ~ 10
Therefore, I suggest the following steps:
- Flower filter: if not, stop
- Sorted array of real ranges: if yes, stop
- The hash table of individual elements
The Sorted Array test is performed first, because of the number you give (millions of numbers combined in several thousand ranges), if the number is contained, most likely it will be in the range, and not alone :)
One final note: beware of O (1), although this may seem attractive, you are not in the asymptotic case here. Quite a few 5000-10000 ranges, since log (10000) is something like 13. Therefore, do not pessimize your implementation by obtaining a solution O (1) with such a high constant coefficient that it actually works slower than O ( log N) solution :)
Matthieu M.
source share