What is the most efficient way to combine 2 lists (fast)?

Question

What is the most efficient way to combine 2 lists (fast)?

I have 2 lists<string> elements, source and target. Items in the source list will have 0 to n matches in the target list, but there will be no matches.

Given that both lists are sorted, how would you most effectively match performance.

Example:

 source = {"1", "2", "A", "B", ...} target = {"1 - new music", "1 / classic", "1 | pop", "2 edit", "2 no edit", "A - sing", "B (listen)", ...}

Basically, a match is a simple prefix match, but let's say you have a method called MatchName . You can use the new feature if you intend to do a more optimized search. NameMatch simply compares 2 lines and returns bool.

In the source source [0] there will be a source [0]. In this case it will consist of target [0, 1 and 2].

+4

performance optimization c # .net search

Joan venge Aug 9 '09 at 9:14

source share

7 answers

Thorarin · Answer 1 · 2009-08-09T09:42:13+0000

I'm not sure it is worth trying to optimize very far. You can implement some kind of binary search with it, but the efficiency will be very limited. How many elements are we talking about?

No inconsistent items in the target

Assuming the lists are sorted and there are no elements in the target that cannot be matched to source :

 static List<string>[] FindMatches(string[] source, string[] target) { // Initialize array to hold results List<string>[] matches = new List<string>[source.Length]; for (int i = 0; i < matches.Length; i++) matches[i] = new List<string>(); int s = 0; for (int t = 0; t < target.Length; t++) { while (!MatchName(source[s], target[t])) { s++; if (s >= source.Length) return matches; } matches[s].Add(target[t]); } return matches; }

With unrivaled elements

If there is a possibility of the existence of elements in target that do not have a match in source , the above will be violated (if the elements are not at the end of the target). To fix this, it would be better to go with a different implementation for comparison. Instead of boolean, we need it to return “less”, “equal”, or “more”, like “Comparison” when used in sorting:

 static List<string>[] FindMatches(string[] source, string[] target) { // Initialize array to hold results List<string>[] matches = new List<string>[source.Length]; for (int i = 0; i < matches.Length; i++) matches[i] = new List<string>(); int s = 0; for (int t = 0; t < target.Length; t++) { int m = CompareName(source[s], target[t]); if (m == 0) { matches[s].Add(target[t]); } else if (m > 0) { s++; if (s >= source.Length) return matches; t--; } } return matches; } static int CompareName(string source, string target) { // Whatever comparison you need here, this one is really basic :) return target[0] - source[0]; }

Both, otherwise, are essentially the same. As you can see, you scroll through the target elements once, pushing the index to the original array when you no longer find a match.

If the number of source elements is limited, a slightly more reasonable search may be required. If the number of starting elements is also large, the perceived advantage of this will decrease.

Then again, the first algorithm takes 0.18 seconds with 1 million target elements on my machine in debug mode. The second is even faster ( 0.03 seconds ), but this is because of the simpler comparison that is being performed. You may need to compare everything up to the first space character, making it significantly slower.

Dykam · Answer 2 · 2009-08-09T09:39:42+0000

Edited, rewritten, not tested, must have a performance of O (source + target). Usage can be MatchMaker.Match (source, target) .ToList ();

 public static class MatchMaker { public class Source { char Term { get; set; } IEnumerable<string> Results { get; set; } } public static IEnumerable<Source> Match(IEnumerable<string> source, IEnumerable<string> target) { int currentIndex = 0; var matches = from term in source select new Source { Term = term[0], Result = from result in target.FromIndex(currentIndex) .TakeWhile((r, i) => { currentIndex = i; return r[0] == term[0]; }) select result }; } public static IEnumerable<T> FromIndex<T>(this IList<T> subject, int index) { while (index < subject.Count) { yield return subject[index++]; } } }

Simple LinQ may not be the fastest, but the cleanest:

 var matches = from result in target from term in source where result[0] == term[0] select new { Term: term, Result: result };

I am against premature optimization.

Guffa · Answer 3 · 2009-08-09T10:11:00+0000

As you sort the items, you can simply scroll through the lists:

 string[] source = {"1", "2", "A", "B" }; string[] target = { "1 - new music", "1 / classic", "1 | pop", "2 edit", "2 no edit", "A - sing", "B (listen)" }; List<string>[] matches = new List<string>[source.Length]; int targetIdx = 0; for (int sourceIdx = 0; sourceIdx < source.Length; sourceIdx++) { matches[sourceIdx] = new List<string>(); while (targetIdx < target.Length && NameMatch(source[sourceIdx], target[targetIdx])) { matches[sourceIdx].Add(target[targetIdx]); targetIdx++; } }

Andrew Barrett · Answer 4 · 2009-08-09T10:18:54+0000

Here is an answer that is repeated only once through both lists using logic that is sorted as optimization. As most of them said, I would not worry too much about optimization, since it is likely to be fast enough with any of these answers, I would go for the most readable and supported solution.

Saying, I need to do something with my cup of coffee, so you are here. One of the advantages of the following is that it allows things in the target list to not have matches in the original list, although I'm not sure if you need this functionality.

 class Program { public class Source { private readonly string key; public string Key { get { return key;}} private readonly List<string> matches = new List<string>(); public List<string> Matches { get { return matches;} } public Source(string key) { this.key = key; } } static void Main(string[] args) { var sources = new List<Source> {new Source("A"), new Source("C"), new Source("D")}; var targets = new List<string> { "A1", "A2", "B1", "C1", "C2", "C3", "D1", "D2", "D3", "E1" }; var ixSource = 0; var currentSource = sources[ixSource++]; foreach (var target in targets) { var compare = CompareSourceAndTarget(currentSource, target); if (compare > 0) continue; // Try and increment the source till we have one that matches if (compare < 0) { while ((ixSource < sources.Count) && (compare < 0)) { currentSource = sources[ixSource++]; compare = CompareSourceAndTarget(currentSource, target); } } if (compare == 0) { currentSource.Matches.Add(target); } // no more sources to match against if ((ixSource > sources.Count)) break; } foreach (var source in sources) { Console.WriteLine("source {0} had matches {1}", source.Key, String.Join(" ", source.Matches.ToArray())); } } private static int CompareSourceAndTarget(Source source, string target) { return String.Compare(source.Key, target.Substring(0, source.Key.Length), StringComparison.OrdinalIgnoreCase); } }

Mike dunlavey · Answer 5 · 2009-08-10T01:51:06+0000

Since they are sorted, is this not just a basic O (N) merge cycle?

 ia = ib = 0; while(ia < na && ib < nb){ if (A[ia] < B[ib]){ // A[ia] is unmatched ia++; } else if (B[ib] < A[ia]){ // B[ib] is unmatched ib++; } else { // A[ia] matches B[ib] ia++; ib++; } } while(ia < na){ // A[ia] is unmatched ia++; } while(ib < nb){ // B[ib] is unmatched ib++; }

Arpit tambi · Answer 6 · 2009-08-09T09:32:37+0000

I think the best way is to prepare an index. How is it (Javascript)

 index = []; index["1"] = [0,1,2]; index["2"] = [3,4];

Well-sorted lists are not required in this case.

pjp · Answer 7 · 2009-08-09T09:49:16+0000

Well, you obviously stop by going through the target list as soon as you walk past the current source prefix. In this case, you are better off using the prefix method than matching so that you can find out what the current prefix is and stop searching for the target if you pass by it.

What is the most efficient way to combine 2 lists (fast)?

No inconsistent items in the target

With unrivaled elements

More articles: