The fastest way to check if a number is in the list of numbers

I need to check if an identifier (long integer) is in a list of 10,000 identifiers. I need to do this about 10-9 times in a loop, and speed is relatively important. Does C ++ use the fastest way to do this? Something like:

set<long> myset; // (Populate myset) long id = 123456789; if(myset.find(id) != myset.end()) { // id is in set } 

Or is there a faster way?

+6
c ++
source share
5 answers

The fastest way if your long one has a limited range is a bitmap (e.g. vector<bool> ). If this is not feasible, you should try a unordered_set (hash_set). And if that holds the worst case, use set

+16
source share

Hm, depending on how you generate the numbers and how many there are, it would be faster to use std::vector , sort it (you can even sort it when inserting numbers) and use a binary search to check if there is a number there.

As a rule, the set works fine, but there are tradeoffs. A vector has less memory overhead, and since all numbers are stored in a contiguous block of memory, it can outperform a set in some situations, but you will have to check it.

+4
source share

You can create a hash table and check O (1) if an identifier exists.

+1
source share

The standard, for the best of intentions, decided that vector<bool> should be specialized in order to be implemented as a bit set.

The bit set is fast enough, and you also have a choice of std :: bitset, which is a fixed size, and boost::dynamic_bitset , whose size is set at runtime, and in any case built on top of vector<unsigned int> This could be a template of which integral type is used).

It makes no sense to optimize further in order to save you a bit, so the proposal will be to use one of them.

By the way, I saw "scattered" bitmaps in which, if the value falls within a certain range, it uses one, otherwise it will use a tree-style search. (This method can also be used for Z-table functions (the usual type of CDF distribution), where you "cache" a table in memory of up to 95% or 99% of the density and use slow calculation for extreme values ​​(and I really had to do this) .

0
source share

If you really want to push it to the top, you can also use a two-step approach.

  • Use a flowering filter or similar probabilistic algorithm to find out if the value is finally NOT in the set or "possibly" in the set.
  • To make sure that step 1 has created a false positive, you need to complete your more expensive second step only for those that are not filtered out by step 1. For your many (you mentioned 10 ^ 9) queries, you need to do more quickly (if not too a lot of requests is a hit ...).

See http://en.wikipedia.org/wiki/Bloom_filter for more details. See also: Search algorithm with minimal time complexity

0
source share