Sort when equality is available

Question

Sort when equality is available

Suppose we have a vector of pairs:

std::vector<std::pair<A,B>> v;

where for type A only equality is defined:

 bool operator==(A const & lhs, A const & rhs) { ... }

How would you sort it, all pairs with the same first element will end? To be clear, the output that I hope to achieve should be the same as something like this:

 std::unordered_multimap<A,B> m(v.begin(),v.end()); std::copy(m.begin(),m.end(),v.begin());

However, I would like, if possible:

Do the sorting in place.
Avoid having to define a hash function for equality.

Edit: additional specific information.

In my case, the number of elements is not particularly large (I expect N = 10 ~ 1000), although I have to repeat this sorting many times (~ 400) as part of a larger algorithm, and the data type known as A quite large (it contains , among other things, unordered_map with ~ 20 std::pair<uint32_t,uint32_t> in it, which is a structure that prevents me from coming up with ordering and makes it harder to build a hash function)

+7

c ++ sorting algorithm partition

pqnet Aug 15 '14 at 12:03

source share

4 answers

if you can come up with a function that assigns a unique number to each unique element, then you can create a secondary array with these unique numbers, and then sort the secondary array and with it the primary one, for example, by merging sort.

But in this case, you need a function that assigns each unique element a unique number, that is, a hash function without collisions. I think this should not be a problem.

And the asymptotic behavior of this solution is, if the hash function has O (1), then building the secondary array is O (N) and sorting it with primary is O (NlogN). And the summary is O (N + NlogN) = O (N logN). And the bad side of this solution is that it requires dual memory.

In conclusion, the main point of this solution quickly translates your elements into elements that you can quickly compare.

+3

MrPisarik Aug 15 '14 at 13:41

source share

In place algorithm

 for (int i = 0; i < n-2; i++) { for (int j = i+2; j < n; j++) { if (v[j].first == v[i].first) { std::swap(v[j],v[i+1]); i++; } }

There may be a more elegant way to write a loop, but this is O (n * m), where n is the number of elements and m is the number of keys. Therefore, if m is much less than n (at best, that all keys are the same), this can be approximated by O (n). In the worst case, the number of keys is ~ = n, so this is O (n ^ 2). I have no idea what you expect from the number of keys, so I can’t do the middle case, but most likely O (n ^ 2) for the middle case.

For a small number of keys, this may work faster than an unordered multimap, but you will need to measure it to find out.

Note: the order of the clusters is completely random.

Edit: (much more efficient in a partially cluster case, does not change the complexity)

 for (int i = 0; i < n-2; i++) { for(;i<n-2 && v[i+1].first==v[i].first; i++){} for (int j = i+2; j < n; j++) { if (v[j].first == v[i].first) { std::swap(v[j],v[i+1]); i++; } }

Edit 2: Comment // MrPisarik, remote redundant I check the inner loop.

+2

Ideahat Aug 15 '14 at 12:56

source share

I am surprised that no one has yet suggested using std::partition . This makes the solution pleasant, elegant and versatile:

 template<typename BidirIt, typename BinaryPredicate> void equivalence_partition(BidirIt first, BidirIt last, BinaryPredicate p) { using element_type = typename std::decay<decltype(*first)>::type; if(first == last) { return; } auto new_first = std::partition (first, last, [=](element_type const &rhs) { return p(*first, rhs); }); equivalence_partition(new_first, last, p); } template<typename BidirIt> void equivalence_partition(BidirIt first, BidirIt last) { using element_type = typename std::decay<decltype(*first)>::type; equivalence_partition(first, last, std::equal_to<element_type>()); }

An example is here .

+2

Chris hayden Aug 15 '14 at 17:47

source share

TemplateRex · Accepted Answer · 2014-08-15T17:57:52+0000

First option: `cluster()` and `sort_within()`

A handwritten double loop by @MadScienceDreams can be written as a cluster() algorithm of complexity O(N * K) with elements N and clusters K He repeatedly calls std::partition (using the C ++ 14 style with common lambdas, easily adapting to C ++ 1 or even C ++ 98 style by writing your own function objects):

 template<class FwdIt, class Equal = std::equal_to<>> void cluster(FwdIt first, FwdIt last, Equal eq = Equal{}) { for (auto it = first; it != last; /* increment inside loop */) it = std::partition(it, last, [=](auto const& elem){ return eq(elem, *it); }); }

which you call on your input vector<std::pair> as

 cluster(begin(v), end(v), [](auto const& L, auto const& R){ return L.first == R.first; });

The next algorithm for writing is sort_within , which takes two predicates: an equality and an object of the comparison function, and calls std::find_if_not to find the end of the current range, and then std::sort to sort within that range

 template<class RndIt, class Equal = std::equal_to<>, class Compare = std::less<>> void sort_within(RndIt first, RndIt last, Equal eq = Equal{}, Compare cmp = Compare{}) { for (auto it = first; it != last; /* increment inside loop */) { auto next = std::find_if_not(it, last, [=](auto const& elem){ return eq(elem, *it); }); std::sort(it, next, cmp); it = next; } }

On an already grouped input, you can name it as:

 sort_within(begin(v), end(v), [](auto const& L, auto const& R){ return L.first == R.first; }, [](auto const& L, auto const& R){ return L.second < R.second; } );

A live example that shows it for some real data with std::pair<int, int> .

Second option: custom comparison

Even if there is no operator< in A , you can define it yourself. There are two broad options here. First, if A hashable, you can define

 bool operator<(A const& L, A const& R) { return std::hash<A>()(L) < std::hash<A>()(R); }

and write std::sort(begin(v), end(v)) directly. You will have an O(N log N) call to std::hash if you do not want to cache all unique hash values in a separate storage.

Secondly, if A not hashed, but has data members x() , y() and z() that uniquely determine equality on A : you can do

 bool operator<(A const& L, A const& R) { return std::tie(Lx(), Ly(), Lz()) < std::tie(Rx(), Ry(), Rz()); }

Again, you can directly write std::sort(begin(v), end(v)) .

Sort when equality is available

First option: cluster() and sort_within()

Second option: custom comparison

More articles:

First option: `cluster()` and `sort_within()`