C # fast Besides the sorted list method

Question

C # fast Besides the sorted list method

I am working on an application whose bottleneck is list1.Except (list2). From this post: whether to use Except or Contains when working with a HashSet or so in Linq , the complexity of the exceptions is O (m + n) (m and n are behind the size of the lists). However, my lists are sorted. Can this help?

The first implementation I can think of:

foreach element in list2 (m operations) look for it in list1 (ln(n) operations) if present set it to null (O(1), removing has a O(n)) else continue

It has complexity O (m * ln (n)), which is very interesting when m is small and n is large (this is exactly the same as my data sets: m is about 50, n is about 1,000,000). However, the fact that it creates zero can make a big difference to functions that use it ... Is there a way to preserve this complexity by not writing zeros (and then tracking them).

Any help would be greatly appreciated!

+4

list c # algorithm linq

RUser4512 Jul 29 '15 at 9:26

source share

2 answers

If both lists are sorted, you can easily implement your own solution:

listA, in addition to the listB algorithm, works as follows:

 1. Start from the beginning of both lists 2. If listA element is smaller than the listB element, then include the listA element in the output and advance listA 3. If listB element is smaller than the listA element, advance listB 4. If listA and listB elements are equal, advance both lists and do not push the element to the output

Repeat until list A is exhausted. Pay particular attention to the fact that listB may be exhausted before the list.

+2

Zoran horvat Jul 29 '15 at 9:45

source share

Louis ricci · Accepted Answer · 2015-07-29T13:27:56+0000

 using System; using System.Collections.Generic; public class Test { public static void Main() { var listM = new List<int>(); var listN = new List<int>(); for(int i = 0, x = 0; x < 50; i+=13, x++) { listM.Add(i); } for(int i = 0, x = 0; x < 10000; i+=7, x++) { listN.Add(i); } Console.WriteLine(SortedExcept(listM, listN).Count); } public static List<T> SortedExcept<T>(List<T> m, List<T> n) { var result = new List<T>(); foreach(var itm in m) { var index = n.BinarySearch(itm); if(index < 0) { result.Add(itm); } } return result; } }

EDIT Here is also version O (M + N)

 public static List<T> SortedExcept2<T>(List<T> m, List<T> n) where T : IComparable<T> { var result = new List<T>(); int i = 0, j = 0; if(n.Count == 0) { result.AddRange(m); return result; } while(i < m.Count) { if(m[i].CompareTo(n[j]) < 0) { result.Add(m[i]); i++; } else if(m[i].CompareTo(n[j]) > 0) { j++; } else { i++; } if(j >= n.Count) { for(; i < m.Count; i++) { result.Add(m[i]); } break; } } return result; }

In the quick and dirty test, http://ideone.com/Y2oEQD M + N is always faster, even when N is 10 million. BinarySearch suffers fines as it gains access to array memory in a non-linear fashion; this leads to a cache failure, which slows down the algorithm, so a larger N gets more memory access restrictions using BinarySearch.

C # fast Besides the sorted list method

More articles: