Use LINQ for more fun.
After several attempts, I came up with a clean LINQ solution that could theoretically be single-line. I tried to make it effective, but, of course, functional solutions will lead to duplication of calculations, since you cannot save state.
We'll start with a little preprocessing to save on recalculations later. Yes, I know what I am doing with the index, this is a controversial practice, but if you are careful, it works and it develops rapidly:
var index = 0; var lookup = original.ToLookup(s => s, s => index++);
Monster
var occurrences = userSelect .Where(lookup.Contains) .SelectMany((s, i) => lookup[s] .Select(j => new { User = userSelect.Skip(i), Original = original.Skip(j), Skipped = i }) .Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped)) .TakeWhile(tuple => tuple.Item1 == tuple.Item2) ) .Select(u => new { Word = s, Start = u.Select(v => v.Item3).Min(), Length = u.Count() }) ) .GroupBy(v => v.Start + v.Length) .Select(g => g.OrderBy(u => u.Start).First()) .GroupBy(v => v.Word) .Select(g => g.OrderByDescending(u => u.Length).First()) .Select(w => Enumerable.Range(w.Start, w.Length).ToArray()) .ToList();
Print this with
foreach (var occurrence in occurrences) { Console.WriteLine( "Maximal match starting with '{0}': [{1}]", userSelect[occurrence[0]], string.Join(", ", occurrence) ); }
gives
Maximal match starting with 'the': [0, 1, 2] Maximal match starting with 'dog': [3] Maximal match starting with 'jumps': [4, 5]
It is immediately clear that you would not want to use this code in production , another (procedural) solution would be preferable today. However, this solution is different in that it is purely functional, with the exception of lookup . Of course, this can also be written functionally:
var lookup = original.Select((s, i) => Tuple.Create) .ToLookup(t => t.Item1, t => t.Item2);
How it works
When diluted, it creates a dictionary-like structure that associates each word in original with indexes, where it appears in one collection. This will be used later to create as many matching sequences as possible with each word in userSelect (for example, "the" will result in two matched sequences since it appears twice in the original ).
Then:
.Where(lookup.Contains)
This is easy, it removes from consideration all words in userSelect that are not displayed in original .
// For each place where the word s appears in original... .SelectMany((s, i) => lookup[s] // Define the two subsequences of userSelect and original to work on. // We are trying to find the number of identical elements until first mismatch. .Select(j => new { User = userSelect.Skip(i), Original = original.Skip(j), Skipped = j }) // Use .Zip to find this subsequence .Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped)).TakeWhile(tuple => tuple.Item1 == tuple.Item2)) // Note the index in original where the subsequence started and its length .Select(u => new { Word = s, Start = u.Select(v => v.Item3).Min(), Length = u.Count() }) )
At this point, we project each matching word in userSelect onto an anonymous object with the Start and Length properties. However, sequences in which the matching length is N also lead to smaller matching sequences of length N-1, N-2, ... 1.
The key point here is the realization that for all subsequences in such sets Start + Length will be the same; in addition, subsequences from different sets will have different Start + Length amounts. So let's use to crop the results:
This will leave us so many matches for every word in userSelect , just that word will appear in original . So, let it also shorten it to the longest match:
.GroupBy(v => v.Word) .Select(g => g.OrderByDescending(u => u.Length).First())
Now we have such an object as { Word = "the", Start = 0, Length = 3 } . Let convert this to an array of indices in userSelect :
.Select(w => Enumerable.Range(w.Start, w.Length).ToArray())
And finally, put all these arrays in the same collection and mission!