I'm not sure it is worth trying to optimize very far. You can implement some kind of binary search with it, but the efficiency will be very limited. How many elements are we talking about?
No inconsistent items in the target
Assuming the lists are sorted and there are no elements in the target that cannot be matched to source :
static List<string>[] FindMatches(string[] source, string[] target) { // Initialize array to hold results List<string>[] matches = new List<string>[source.Length]; for (int i = 0; i < matches.Length; i++) matches[i] = new List<string>(); int s = 0; for (int t = 0; t < target.Length; t++) { while (!MatchName(source[s], target[t])) { s++; if (s >= source.Length) return matches; } matches[s].Add(target[t]); } return matches; }
With unrivaled elements
If there is a possibility of the existence of elements in target that do not have a match in source , the above will be violated (if the elements are not at the end of the target). To fix this, it would be better to go with a different implementation for comparison. Instead of boolean, we need it to return “less”, “equal”, or “more”, like “Comparison” when used in sorting:
static List<string>[] FindMatches(string[] source, string[] target) { // Initialize array to hold results List<string>[] matches = new List<string>[source.Length]; for (int i = 0; i < matches.Length; i++) matches[i] = new List<string>(); int s = 0; for (int t = 0; t < target.Length; t++) { int m = CompareName(source[s], target[t]); if (m == 0) { matches[s].Add(target[t]); } else if (m > 0) { s++; if (s >= source.Length) return matches; t--; } } return matches; } static int CompareName(string source, string target) { // Whatever comparison you need here, this one is really basic :) return target[0] - source[0]; }
Both, otherwise, are essentially the same. As you can see, you scroll through the target elements once, pushing the index to the original array when you no longer find a match.
If the number of source elements is limited, a slightly more reasonable search may be required. If the number of starting elements is also large, the perceived advantage of this will decrease.
Then again, the first algorithm takes 0.18 seconds with 1 million target elements on my machine in debug mode. The second is even faster ( 0.03 seconds ), but this is because of the simpler comparison that is being performed. You may need to compare everything up to the first space character, making it significantly slower.