Find the shortest combination of words containing each character from a given set

I got an array (let them call it a1) of words (for example, "dog", "fish", "run", "programming" of something), which is really huge.

I can combine any words from a1 with any other words (for example, you could combine “dog” and “programming” with “dog programming”), and then again and again until the lines get really big.

I also got an array (let them call it a2) of strings (such as "de", "s", "x?", "Umh", again they can be almost anything). It is guaranteed that in a2 there are no lines that cannot be found in any line of a1.

What I'm looking for is the shortest string (the shortest in terms of how many combinations are required to create this string, and not so many characters that the string contains), which contains all the lines inside a2. If there are several lines, each of which has a minimum length, I can simply select any and release from the program.

Now I don’t think that I can just force it roughly, because even if the arrays are relatively small, there are almost endless options for combining words, but I would really like to be proved incorrectly!

Is there a good way to get the shortest string that will undoubtedly give the shortest result, or will I have to use heuristic algorithms to just search for fairly short strings?

EDIT: I tried just to select a row from a1 that covered most of the rows from a2, and then removed these elements from a2 and ran it again, but that didn't work! This gives good results, but not the best.

+4
source share
1 answer

if you combine words with dashes, as in your example, for example

dog + programming + sky = dog-programming-sky 

And the words in A2 do not contain dashes, then this is just a SET-COVER masking, NP is a complete optimization problem. Then you can solve the problem using any solution strategy available for SET-COVER. There is a quick approximation algorithm for SET-COVER, but if you want to have the absolute smallest solution, you need to resort to the worst exponential algorithm.

If you combine words without a dash, for example

 dog + programming + sky = dogprogrammingsky 

then the problem is more complicated, since now, for example, "ogpro" occurs in a combined word, even if it is not a substring of any of the compound lines.

+3
source

All Articles