Detecting duplicates in an array using divide and conquer

I was asked the next question on the exam, and it seems that this may not be possible. Is something missing?

Given an array of n objects that can only be compared for equality and not knowing anything about the range of values ​​in the array, give a divide and conquer solution to detect if there are any duplicates in the array. This should be an O (nlogn) solution.

We can safely assume, due to the nature of the question, that the solution probably has nothing to do with data structures or radix sorts, so can this be done locally?

If so, how?

+7
source share
6 answers

You cannot check duplicates in O(nlogn) if you cannot order items, and you cannot order them if you can only compare for equality.

In fact, you cannot be sure that there are no duplicates unless you compare each pair and there are n(n-1)/2 such pairs.

+6
source

How about using hashset. Add each item to the set. Then check the size. However, it does not share and does not win.


Will the comparison result for equality tell you which of the two compared objects was “larger”?

If you can create a complete set of objects, I think you could use one of the inplace divide and conq sorting algorithms, but add extra logic that detects duplicates. (turn <= check into check <and ==)

+1
source

Since this is O (nlogn), basically you can sort the array and find duplicates. Since you want to use divide and conquer, I suggest using quicksort.

0
source

The only way to do this on NlogN is to "trick".

In .NET and Java, any interface implementation, such as .NET IEquatable, that provides only the Equals () method, is also a base-level object. Objects in .NET and Java have a hash function (in .NET it is GetHashCode (), in Java - hashCode ()). That way, no matter what method restricts you, you always have access to a hash function that will produce a numerical value.

This will allow you to hash each object and compare hashes for a relative value. This, in turn, allows you to sort the array by hash, and then scan it in linear time to detect duplicates. You can do this in place or leave the original array intact by inserting each element into a red-black tree, a hash table, or a dictionary attached to a hash value (all of which have login or better access time, as well as logN or better insertion times )

As indicated in the comments, any of these approaches can be parallelized to multiple threads, which allows for "division and conquest"; sorting can be done using parallel MergeSort, while depending on the objects that you have access to in your environment, you can use a thread-safe “parallel” collection, which in turn allows you to split the array into subarrays inserted into the collection, multiple threads. Scanning a sorted list can also be parallelized if you overlap the submatrix specified for each stream with one element, preventing one element of the duplicated pair from falling into one subarray, and the other in the next using serendipity.

0
source

Maybe there is another way to consider the analysis?

Agreed, the worst case is in O (N ^ 2). But the best case is O (1).

Looking solely at the fact that there is only equal , and the range of values ​​is unknown, is it fair to say that there is only one way to get N ^ 2, and that is when all the values ​​are different, or uneven?

Similarly, there is only one way to guarantee duplicate searches in 1 test, and that is when all values ​​are equal.

There are many ways in which it is impossible to match all objects before finding an identical pair. If there are N / 2 pairs, N / 3 triples, N / 4 fours, N / sqrt (N) sets of duplicates sqrt (N), etc., How much do you need to compare before searching for a pair, that is, a duplicate?

I think it’s like “finding one pair of socks” by picking from a draw a sock with an unknown number of identical socks with two or more identical socks in the set. ”The owner of the sock replenishes the draw by buying an unknown number of pairs of identical socks and throws the sock when it has a hole in. We don’t know how quickly the socks wear out, or how quickly the wearer buys the socks.

On average, we do not expect much better performance than N ^ 2?

0
source

You can solve this problem with modified quicksort. If instead of comparing> you simply replace it with the equality operator. Modified quicksort will group items together.

Then all you have to do is look for bands to look for cheaters.

Take a look at this example.

https://repl.it/@fernandozamoraj/DearJollySynergy

0
source

All Articles