Interesting Strings Algorithm

Given the two finite sequences of the string, A and B , of length n each, for example:

 A1: "kk", A2: "ka", A3: "kkk", A4: "a" B1: "ka", B2: "kakk", B3: "ak", B4: "k" 

Give the final sequence of indices so that their concentration for A and B gives the same row. Repeats allowed.

In this example, I cannot find a solution, but, for example, if the list (1,2,2,4) is a solution, then A1 + A2 + A2 + A4 = B1 + B2 + B2 + B4 . There are only two characters in this example, but it is already very complicated. In fact, it’s not even trivial to find the shortest solution with one character!

I tried to think about things. For example, the total sum of the string lengths should be the same, and the corresponding characters are needed for the first and last lines. But nothing more. I believe that for some set of strings this is simply not possible. Can anyone think of a good algorithm?

EDIT: Apparently this is a Correspondence Problem Report.

There is no algorithm that could decide whether such an instance has a solution or not. If that were the case, the stop problem could be solved. Dirty trick ...

+4
source share
4 answers

It is unclear what “solution” you are looking for, the longest solution? the shortest? all solutions?
Since you allow repetition, there will be an infinite number of solutions for some inputs, so I will work on:

Find all sequences under a fixed length.

Written as pseudocode, but very similar to f # sequence expressions

 // assumed true/false functions let Eq aList bList = // eg Eq "ab"::"c" "a" :: "bc" -> true // Eq {} {} is _false_ let EitherStartsWith aList bList = // eg "ab"::"c" "a" :: "b" -> true // eg "a" "ab" -> true // {} {} is _true_ let rec FindMatches AB aList bList level = seq { if level > 0 if Eq aList bList yield aList else if EitherStartsWith aList bList Seq.zip3 AB seq {1..} |> Seq.iter (func (a,b,i) -> yield! FindMatches AB aList::(a,i) bList::(b,i) level - 1) } let solution (A:seq<string>) (B:seq<string>) length = FindMatches AB {} {} length 

Some trivial limitations to reduce the problem:

  • The first pair of options should have a common start section.
  • the final pair of choice should have a common end section.

Based on this, we can quickly eliminate many inputs without a solution.

 let solution (A:seq<string>) (B:seq<string>) length = let starts = {} let ends = {} Seq.zip3 AB seq {1..} |> Seq.iter(fun (a,b,i) -> if (a.StartsWith(b) or b.StartsWith(a)) start = starts :: (a,b,i) if (a.EndsWith(b) or b.EndsWith(a)) ends = ends :: (a,b,i)) if List.is_empty starts || List.is_empty ends Seq.empty // no solution else Seq.map (fun (a,b,i) -> FindMatches AB {} :: (a,i) {} :: (b,i) length - 1) starts |> Seq.concat 
0
source

A very difficult question, but I will give him a chance. This is more a stream of consciousness than an answer, sorry in advance.

If I understand this correctly, you are given 2 identical string sequences, A and B, indexed from 1..n, say. Then you need to find a sequence of indices such that the string concatenation A (1) .. A (m) is equal to the string concatenation B (1). B (m), where m is the length of the sequence of indices.

The first thing I would like to notice is that there can be an infinite number of solutions. For example, given:

A {"x", "xx"}
B {"xx", "x"}

Possible solutions:

{12}
{2, 1}
{1, 2, 1, 2}
{1, 2, 2, 1}
{2, 1, 1, 2}
{2, 1, 2, 1}
{1, 2, 1, 2, 1, 2}
...

So how would you know when to stop? Once you have one solution? As soon as one of the solutions is a superset of the other solution?

One place you could start is to take all the lines of minimum total length from both sets (in my example above, you should take an “x” from both and look for 2 equal lines that have a common index. Then you can repeat this for lines of the next size up. For example, if the first set has 3 lines of length 1, 2 and 3, respectively, and the second set has lines of length 1, 3 and 3, respectively, you should accept lines of length 3. You would do this while you had there will be no more lines, if you find anything, then you have a solution to the problem.

Then it gets harder when you need to start combining multiple lines, as in my example above. A naive and crude approach would be to start rearranging all the strings from both sets, which, when concatenated, lead to strings of the same length, and then compare them. So in the example below:

A {"ga", "bag", "ac", "a"}
B {"ba", "g", "ag", "gac"}

You start with sequences of length 2:

A {"ga", "ac"}, B {"ba", "ag"} (indices 1, 3)
A {"bag", "a"}, B {"g", "gac"} (indices 2, 4)

Comparing them, we get “gaac” versus “baag” and “baga” versus “ggac”, none of which are equal, so there are no solutions. Then we will look for sequences of length 3:

A {"ga", "bag", "a"}, B {"ba", "g", "gac"} (indices 1, 2, 4)
A {"bag", "ac", "a"}, B {"g", "ag", "gac"} (indices 2, 3, 4)

Again, there are no solutions, so we get sequences of size 4, of which we have no solutions.

Now it’s getting even more complicated, as we need to start thinking about perhaps repeating some indicators, and now my brain is melting.

I think that finding common subsequences in strings might be useful, and then use the rest of the strings that were not matched. But I do not quite understand how.

+2
source

A very simple way is to just use something like a breadth-first search. This also has the advantage that the first solution found will have a minimum size.

+2
source

Here is a suggestion for brute force search. First, create numerical sequences limited by the length of your list:

[0,0, ..] [1,0, ..] [2,0, ..] [3,0, ..] [0,1, ..] ...

The length of the sequence length determines how many lines there will be in any solution found. Then generate strings A and B using numbers as indices in your string lists:

 public class FitSequence { private readonly string[] a; private readonly string[] b; public FitSequence(string[] a, string[] b) { this.a = a; this.b = b; } private static string BuildString(string[] source, int[] indexes) { var s = new StringBuilder(); for (int i = 0; i < indexes.Length; ++i) { s.Append(source[indexes[i]]); } return s.ToString(); } public IEnumerable<int[]> GetSequences(int length) { foreach (var numberSequence in new NumberSequence(length).GetNumbers(a.Length - 1)) { string a1 = BuildString(a, numberSequence); string b1 = BuildString(b, numberSequence); if (a1 == b1) yield return numberSequence; } } } 

This algorithm assumes equal lengths for A and B. I checked your example with

  static void Main(string[] args) { var a = new[] {"kk", "ka", "kkk", "a"}; var b = new[] {"ka", "kakk", "ak", "k"}; for (int i = 0; i < 100; ++i) foreach (var sequence in new FitSequence(a, b).GetSequences(i)) { foreach (int x in sequence) Console.Write("{0} ", x); Console.WriteLine(); } } 

but could not find any solutions, although it seemed to work for simple tests.

0
source

All Articles