Fastest C ++ container: unique values

I am writing an email application that interacts with a MySQL database. I have two tables that are looking for my data, one of which contains unsubscriptions, the other of which is a standard user table. At the moment, I am creating a vector of pointers to email objects and first save all pending emails in it. Then I have a standard SQL loop in which I check to see if the email is indicated in the unsubscribe vector, and then add it to the global send email vector. My question is: is there a more efficient way to do this? I have to look for the wrong vector for every single letter in my system, up to 50K different. Is there a better structure to look for? And, the best structure to maintain a unique collection of values? Perhaps one that just drops the value if it already contains it?

+6
c ++ algorithm data-structures vector search
source share
4 answers

If your implementation of the C ++ standard library supports it, consider using std::unordered_set or std::hash_set .

You can also use std::set , although its overhead may be higher (this depends on the cost of generating a hash for the object and the cost of comparing two objects several times).

If you use a container based on node, such as set or unordered_set , you also get the advantage that removing elements is relatively cheap compared to removing from vector .

+6
source share
  • Tasks like this one (given manipulations) are best left to what MEANT is, to execute them - the database!

    eg. something like:

      SELECT email FROM all_emails_table e WHERE NOT EXISTS ( SELECT 1 FROM unsubscribed u where e.email=u.email ) 
  • If you need ALGORITHM, you can do this quickly by extracting both the list of letters and the list of unsubscribes in the form of ORDERED lists. Then you can view the email list (which is ordered), and as you do this, you slide around the unsubscribe list. The idea is that you move 1 forward in that the list has the "largest" current item. This algorithm is O (M + N) instead of O (M * N), as your current

  • Or you can make a hash map that displays from an unsigned email address to 1. Then you make find() calls on that map for which there is O (1) for each hash implementation for each search. Unfortunately, C ++ does not have a hash map standard - see this SO question for existing implementations (a couple of ideas are SGI STL hash_map and Boost and / or TR1 std::tr1::unordered_map ).

    One of the comments on this post indicates that it will be added to the standard: β€œGiven this, the C ++ Standard Library technical report introduced unordered associative containers that are implemented using hash tables, and now they are added to the working draft of the C ++ standard .

+5
source share

Store email addresses in std::set or use std::set_difference() .

+4
source share

The best way to do this in MySQL, I think. You can change the table layout of your users with a different column, the BIT column, for unsubscribed. Even better: add a DATETIME column for "date deleted" with a default value of NULL .

If a BIT column is used, your query will look something like this:

 SELECT * FROM `users` WHERE `unsubscribed` <> 0b1; 

If you use the DATETIME column, your query will look something like this:

 SELECT * FROM `users` WHERE `date_unsubscribed` IS NULL; 
+1
source share

All Articles