The fastest way to check if a number is in the list of numbers

Question

The fastest way to check if a number is in the list of numbers

I need to check if an identifier (long integer) is in a list of 10,000 identifiers. I need to do this about 10-9 times in a loop, and speed is relatively important. Does C ++ use the fastest way to do this? Something like:

set<long> myset; // (Populate myset) long id = 123456789; if(myset.find(id) != myset.end()) { // id is in set }

Or is there a faster way?

+6

c ++

Nick edwards Mar 09 '11 at 13:58

source share

5 answers

Hm, depending on how you generate the numbers and how many there are, it would be faster to use std::vector , sort it (you can even sort it when inserting numbers) and use a binary search to check if there is a number there.

As a rule, the set works fine, but there are tradeoffs. A vector has less memory overhead, and since all numbers are stored in a contiguous block of memory, it can outperform a set in some situations, but you will have to check it.

+4

Björn pollex Mar 09 '11 at 14:03

source share

You can create a hash table and check O (1) if an identifier exists.

+1

Aviv A. Mar 09 '11 at 2:04

source share

The standard, for the best of intentions, decided that vector<bool> should be specialized in order to be implemented as a bit set.

The bit set is fast enough, and you also have a choice of std :: bitset, which is a fixed size, and boost::dynamic_bitset , whose size is set at runtime, and in any case built on top of vector<unsigned int> This could be a template of which integral type is used).

It makes no sense to optimize further in order to save you a bit, so the proposal will be to use one of them.

By the way, I saw "scattered" bitmaps in which, if the value falls within a certain range, it uses one, otherwise it will use a tree-style search. (This method can also be used for Z-table functions (the usual type of CDF distribution), where you "cache" a table in memory of up to 95% or 99% of the density and use slow calculation for extreme values (and I really had to do this) .

0

Cashcow Mar 09 '11 at 2:36 p.m.

source share

If you really want to push it to the top, you can also use a two-step approach.

Use a flowering filter or similar probabilistic algorithm to find out if the value is finally NOT in the set or "possibly" in the set.
To make sure that step 1 has created a false positive, you need to complete your more expensive second step only for those that are not filtered out by step 1. For your many (you mentioned 10 ^ 9) queries, you need to do more quickly (if not too a lot of requests is a hit ...).

See http://en.wikipedia.org/wiki/Bloom_filter for more details. See also: Search algorithm with minimal time complexity

0

Bittickler May 18, '13 at 20:40

source share

Erik · Accepted Answer · 2011-03-09T14:03:29+0000

The fastest way if your long one has a limited range is a bitmap (e.g. vector<bool> ). If this is not feasible, you should try a unordered_set (hash_set). And if that holds the worst case, use set

The fastest way to check if a number is in the list of numbers

More articles: