Reordering an array so that it is grouped by the same elements

I am considering an array processing problem that can be easily solved with sorting. However, my requirement is actually more relaxed than a complete sort: I only need a guarantee that if there are identical elements in the array, they will be found next to each other in the array.

Is there an algorithm to reorder the array so that it meets the above criteria, which would be more efficient just by doing a full sort?

+5
source share
5 answers

So the answer is no. This information really does not help. This may help you a little better, but not in big O.

For everyone who suggested hashing to get linear time, you can just as well do the same thing for sorting. This method is called radix / hash collation. It exploded your memory usage.

When there are more restrictions, you can even use cooler tricks (i.e. sum, xor, etc.)

However, for an algorithm that uses comparison only in a generic array, you are not buying a lot, reducing the problem in this way.

To give a simple intuition for this, suppose you have 1 redundancy for each element, so your array is a1, a1, ... an, an (only 2n elements out of n unique numbers).

The size of the solution space is n! (as long as aj-aj are paired, you can rearrange the pair anyway, as indicated in the statement of the problem). The size of the input space is (2n)! / (2 ^ (n)).

This means that your algorithm should receive enough information to order ((2n)! / N!) / (2 ^ n) = (n * (n + 1) * ... 2n) / (2 ^ n) elements. Each comparison gives you 1 bit of information. The number of comparison iterations required is log (n) + log (n + 1) ... + log (2n) -n, which is big_omega (nlog (n)). It is not better or worse than sorting.

Here's a semi-rigid sorting treatment: http://www.cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf

Perhaps I will bribe to create a similar proof for the current question.

+3
source

If ordering is not an issue, you can try any of the hashing methods. All hashing methods usually result in a grouping of similar items. For each element of the array, just use the hash function, and all elements are grouped according to the function you define.

+6
source

If all elements can be divided into two groups, we can solve this problem with a hash.
Complexity time = O(n) and additional space = O(1) .

If all the elements can be divided into three groups, we can apply this method twice. Complexity time = O(n) * 2 = O(n) and additional space = O(1) .

If all the elements can be divided into four groups, we can apply the first method three times.
Complexity time = O(n) * 3 = O(n) and additional space = O(1) .

If all the elements can be divided into groups k, we can apply the first method (k-1) times. Complexity time = O(n) * (k-1) = O(k*n) and additional space = O(k) .

This methodology is superior to time sorting only when O(k) < O(log n) .

In fact, when all the elements can be divided into three groups, this problem becomes the Dutch national flag problem proposed by Edsger Dijkstra .

+1
source

Do you have restrictions on different keys?

If not, you are actually looking more for a hash bag than for any sorting.

Since you did not give a programming language, here is a python example:

 from collections import defaultdict data=[ (1,"a"), (2, "c"), (3, "d"), (2, "b") ] table = defaultdict(lambda: list()) for key, record in data: table[key].append(record) for key, values in table.iteritems(): for value in values: print key, value 

This should be done in linear time, as the search in the hash table is considered O (1).

If your data is larger than your main memory, a classic external sorting method may be faster than a bad hit on an external hash table. In general, a full grade can be faster, because the algorithms are well optimized ! So do the tests!

0
source

The idea is similar to the Python code above, but in the CL:

 (defun partition (array &key (test #'eql)) (loop with table = (make-hash-table :test test) for i across array do (setf (gethash i table) (1+ (gethash i table 0))) finally (return (loop with result = (make-array (length array)) with pointer = -1 for key being the hash-keys of table for value being the hash-values of table do (loop while (> value 0) do (setf (aref result (incf pointer)) key value (1- value))) finally (return result))))) (partition #(1 2 3 "2" "2" "3" 'a 'b 'b 3 3 1 1 "3" 'a)) ;; #(1 1 1 2 3 3 3 "2" "2" "3" 'A 'B 'B "3" 'A) (partition #(1 2 3 "2" "2" "3" 'a 'b 'b 3 3 1 1 "3" 'a) :test #'equal) ;; #(1 1 1 2 3 3 3 "2" "2" "3" "3" 'A 'A 'B 'B) 

It also illustrates that the concept of equal is not given. You can determine what you consider equal, and depending on this definition, it may or may not make sense to compare it with the speed of sorting (since sorting implies that you can arrange the sequence).

0
source

All Articles