Common Subset Search Algorithm

I have N numbers of sets S i numbers, each of which has a different size. Let m 1 , m 2 , ... m n be the sizes of the corresponding sets (m i = | S i |), and M the size of the largest set. I have to find common subsets in which there are at least two numbers. Example:

Set Items 1 10,80,22 2 72, 10, 80, 26,50 3 80, 4 10, 22 5 22, 72, 10, 80, 26,50 

Thus, the result will be such that

 Items Found in sets 10, 22 1, 4 10, 80 1, 2, 5 10, 80, 22 1, 5 10, 72, 80, 26, 50 2, 5 

So, how to automate this problem and what is the expected complexity for an appropriate solution? I need it to be as fast as possible.

+6
algorithm
source share
5 answers

Since the original question asks to discuss as soon as possible how this can be made short.

First, discuss whether the output is exponential for your input. Suppose you have 2 elements and N sets. Suppose that each element belongs to each set; for this you need N lines. Then how many lines should be printed?

I think you are going to print lines 2 N -N-1, for example:

 1,2 1,2 1,2 1,3 ..... 1,2 1,N 1,2 2,1 ..... 1,2 1,2,3 ..... 1,2 1,2,3,...N 

Since you are going to spend time printing O (2 N ), you can choose any offers on this page to calculate this information - you were still caught by the exhibitor.

This argument will make your discussion very quick.

+4
source share

You can do it -

  • Create a hash table and insert as a key each element and values ​​as the sets to which they belong. Eg: 10:[0,1,3,4] 80:[0,1,2,4] 22:[0,3,4] 72:[1,4] 26:[1,4] 50:[1,4]

  • After that, for each 2 elements of the hash table, find the intersection of the key values. Here the intersection (10, 80) gives [0,3,4]. Do it for (10,22) (10,72) ... You have your own result.

  • Difficulty - To insert M elements into a hash table - O (M) Search for each key in a hash table - O (1) Intersection of a key operation - O (m ^ 2) [This could be optimized]

In general, I would say that this is O (m ^ 2) algo. But if the size of the "Elements" is small in each "set", you would not notice a big increase in productivity.

+2
source share

One solution that I can think of:

Use a hash table, generate all combinations of a “pair of numbers” of numbers in this “string”, which is O (M * M) time.

Then use them as hash keys, matching them with the index of the main array.

 For each row of those N elements, do the step above... and if the hash already maps to a number, then it is a match, then return the pair, or else just put those in the hash 

it's O (N * M * M)

update: if, as you say in the comment, M can be a maximum of 15, then O (N * M * M) is actually the same as O (N).


If your initial question is to find all pairs, then

 For each row of those N elements, do the step first mentioned... and if the hash already maps to a number, then it is a match and just print out the pair if this pair is not printed yet and in any event, put the mapping into the hash to tell whether a pair has been printed, created another hash, such as printed_or_not[i,j] = true or false, and make sure to check printed_or_not[i,j] or printed_or_not[j, i] because not need to print again if just different order 
+2
source share

You will be asked to make a pairwise intersection of your entire set, and then collect all results of size → 2.

There are pairs O (N ^ 2). The intersection is O (M) for each. Collect all the results, sort them by the given content in order to produce duplicates: N ^ 2 Log N ^ 2 (the worst case is that the intersection is different for each pair, so there may be a set of results O (N ^ 2))

So, I believe the complexity is O ((N ^ 2 + N log N) * M)

But there might be a mistake somewhere.

Floor

+2
source share

I have some tips. You must sort all the elements in the sets when you loop in the loop to count the numbers in that set. You must add a condition to check for a number greater than your search value, and use break; for the end of the cycle.

0
source share

All Articles