C ++ std :: map or std :: set - effectively insert duplicates

I have a bunch of data filled with duplicates and I want to delete duplicates. You know, for example. [1, 1, 3, 5, 5, 5, 7] becomes [1, 3, 5, 7].

It looks like I can use either std :: map or std :: set to handle this. However, I'm not sure if (a) it’s faster to just paste all the values ​​into the container, or (b) check if they exist in the container and only paste if they don’t do it - are inserts very efficient? Even if there is a better way ... can you suggest a quick way to do this?

Another question is if the data stored in them is not as trivial as integers, but instead is an ordinary class, then how does std :: map manage to store (hash?) Data for quick access through the [] operator?

+6
source share
5 answers

std::map does not use hashing. std::unordered_map does, but it's C ++ 11. std::map and std::set both use the comparator you provided. Class templates have default values ​​for this comparator, which comes down to comparing operator< , but you can provide your own.

If you don’t need both the key and the value you want to save (it looks like you didn’t), you should just use std::set , as it is more suitable.

The standard does not say which map and set data structures are used under the hood, only that certian actions have certain time difficulties. In fact, in most implementations, I know how to use a tree.

The time complexity does not matter if you use operator[] or insert , but I would use insert or operator[] before I did a search and then insert if the item was not found. Later, two separate searches will be implied to insert an element into the set.

+9
source

An insert() in any of the associated containers does find() to see if the object exists and then inserts the object. A simple nesting of elements in std::set<T> should be effectively eliminated.

Depending on the size of your set and the ratio of duplicates to unique values, it may be faster to place objects in std::vector<T> , std::sort() , and then use std::unique() along with std::vector<T>::erase() to get rid of duplicates.

+7
source

How many times do you have to do this?

If the insert is normal:

 //*/ std::set<int> store; /*/ // for hash: std::unordered_set<int> store; //*/ int number; if ( store.insert(number).second ) { // was not in store } 

If you fill out once:

 std::vector<int> store; int number; store.push_back(number); std::sort(store.begin(),store.end()); store.erase(std::unique(store.begin(),store.end()),store.end() ); // elements are unique 
+2
source

Assuming a common implementation strategy for std::map and std::set , that is, balanced binary search trees, both insert and search must go through the tree to find where the key should be. Thus, an unsuccessful search followed by an insert will be approximately two times slower than just an insert.

how can std :: map manage to store (hash?) data for quick access through the [] operator?

Using the comparison function that you specify (or std::less , which works if you overload operator< in your custom type). In any case, std::map and std::set are not hash tables.

0
source

std::set and std::map both implemented as red ebony, as far as I know. And, probably, using only the insert will be faster (then both of them, because you will double the search time).

Also map and set use operator < . As long as your class defines operator < , it will be able to use them as keys.

0
source

Source: https://habr.com/ru/post/927385/


All Articles