Do I have to randomly move before inserting into the STL set?

I need to insert 10 million rows into a C ++ STL set. Lines are sorted. Will I have a pathological problem if I insert rows in sorted order? Should I randomize first? Or will the g ++ STL implementation be automatically rebalanced for me?

+7
c ++ set stl
source share
7 answers

The only question I have is: do you really need set ?

If the data is already sorted and you do not need to insert / delete elements after creation, deque would be better:

  • you will have the same complexity with large O using binary search to extract
  • you will have less memory overhead and a higher cache locality.

In binary_search : I suspect you need more than ForwardIterator for binary search, guess this site is off again ForwardIterator

+2
source share

A red-black tree is usually used in the set implementation, which will be rebalanced for you. However, pasting can be faster (or it may not be) if you produce data before pasting is the only way to make sure that you have to run a test with set implementation and specific data. The search time will be the same anyway.

+4
source share

Implementation will be automatically balanced. Given that you know that the input is sorted, however, you can help a little: you can specify a "hint" when inserting, in which case the iterator supplying the previously inserted element will be exactly right the delivery hint for the next insertion. In this case, each insert will have amortized constant complexity instead of the logarithmic complexity that you would otherwise expect.

+3
source share

http://en.wikipedia.org/wiki/Standard_Template_Library

set: "Implemented using a self-balancing binary search tree."

+1
source share

g ++ libstdc ++ uses red black trees for sets and maps.

http://en.wikipedia.org/wiki/Red-black_tree

This is a self-balancing tree, and the inserts are always O (log n). The C ++ standard also requires that all implementations have this characteristic, so in practice they are almost always red black trees or something very similar.

So don’t worry about how you insert elements.

+1
source share

A very cheap and easy solution is to insert from both ends of your string collections. That is, first add β€œA”, then β€œZZZZZ”, then β€œAA”, then β€œZZZZY”, etc., until you meet in the middle. This does not require significant shuffling costs, but it is likely to cost in pathological cases.

+1
source share

Maybe "unordered_set" might be an alternative.

0
source share

All Articles