The problem with the algorithm is to find the smallest common subset

Question

The problem with the algorithm is to find the smallest common subset

a - objects with several "categories", b, for example, a1 has three cateories b1, b2, b3. The problem is to reduce the number of categories (which can grow quite large) into groups that always meet together. "The largest common subset."

So, for example, given the following data set:

a1{ b1,b2,b3 } a2{ b2,b3 } a3{ b1,b4 }

We can find that b2 and b3 are always combined.

 b23 = {b2,b3}

.. and we can reduce the set of categories to this:

 a1{ b1, b23 } a2{ b23 } a3{ b1,b4 }

So my problem is to find some kind of algorithm to solve this problem.

I started looking at the Longest Common Sequence problem, and it might be a solution. that is, repeat the grouping of categories such as b' = LCS(set_of_As) until all categories have passed. However, this is not all. I have to somehow restrict the input domain to make this possible.

Skip something obvious? Any hints of a problem domain you can point me to? Does anyone recognize any other approach to such a problem.

+7

algorithm

Petter nordlander Oct 23 '12 at 18:33

source share

2 answers

I think you're on the right track with LCS, if you can put order into directories (if not then, the LCS algorithm cannot recognize {b3, b4} and {b4, b3}). If you can impose and organize and sort them, I think something like this might work:

 As = {a1={b1, b2},a2={b3},...} while ((newgroup = LCS(As)) != empty) { for (a in As) { replace newgroup in a } }

0

Chris arnold Oct 23 '12 at 18:46

source share

Rafael baptista · Accepted Answer · 2012-10-23T18:46:18+0000

Transform your sets to have sets from b that include:

 b1 { a1, a3 } b2 { a1, a2 } b3 { a1, a2 } b4 { a3 }

Make sure the contents of the new b-sets are sorted.

Sort the b-elements by their contents.

Any two adjacent sets with the same elements are b that occur in the same sets.

The problem with the algorithm is to find the smallest common subset

More articles: