Effective algorithm for finding additions and abstractions from 2 collections

Hi, I would like to implement an efficient algorithm to handle the following case:

Suppose we have 2 lists with the following elements:

Source: [a, b, c, d, e] New: [d, e, f, g]

Now I need to update the source with the new information. The algorithm must be able to find that "f" and "g" are new elements, that "a", "b" and "c" were deleted and that "d" and "e" were not changed.

The operations used are the intersection operations between Source and New, and vice versa. I am looking for an efficient algorithm to implement in C # for arbitrary unsorted enumerations.

Thanks in advance,

+2
set c # algorithm
source share
5 answers
var added = New.Except(Source); var removed = Source.Except(New); var notModified = Source.Intersect(New); 

If you want to have an approach in which you "show your work", I would suggest placing them each in HashSets, as this allows you to quickly check Contains compared to other enumerations.

Edit:

Well, if we go at full speed due to the effectiveness of the expression, then with the following assumptions:

  • We have a reasonably hash type of the element (if not, but it can be absolutely sorted, then the SortedList can beat the hash set).
  • We cannot predict whether Source or New will be larger (in the example there is a slight advantage to do it differently, as I have, but I assume that this is random in the data, and that we should expect everyone with equal credibility.

Then I would suggest:

 HashSet<T> removed = Source as HashSet<T> ?? new HashSet<T>(Source); LinkedList<T> added = new LinkedList<T>(); LinkedList<T> notModified = new LinkedList<T>(); foreach(T item in New) if(removed.Remove(item)) notModified.AddLast(item); else added.AddLast(item); 

When setting up removed I test if it already hashset to avoid wasting a wasteful new one (I assume the input is entered as IEnumerable<T> ). Of course, this is a destructive action, so we might want to avoid it.

Please also note that I change the hash when enumerating through it. This is allowed using hashset, but beyond the guarantees provided by the counters, it also depends on the implementation. However, given the current structure. this is more efficient for this than testing and adding to another remote collection.

I went to linked lists for the other two collections, as they tend to work well in terms of insertion cost (not just O (1), but fast O (1) compared to using another set).

Now, if you want to go even further, there may be micro-optimizations in the implementation of the hash set if you flip your own.

+6
source share

I have not tested this for performance, but my gut feeling is that you have to sort the two lists first. You can then go through the list of keys for each deleted, added, or immutable element as you move.

 1- Sort the Old and New list 2- Set up a pointer for each list lets call them p1 and p2 3- Step the pointers using the following algorithm a) If Old[p1] = New[p2] the items are unchanged, increment p1 and p2 b) If Old[p1] < New[p2] then Old[p1] has been removed, increment p1 c) If Old[p1] > new[p2] then New[p2] is a new element, increment p2 d) If p1 > Old.ItemCount then break out of loop, rest of New contains new items e) If p2 > New.ItemCount then break out of loop, rest of Old items have been removed f) If p1 < Old.ItemCount and p2 < Old.ItemCount Goto step **a** 

It was just in my head, but the basics should be correct. The key to this is that the lists are sorted, of course.

Here is a quick and dirty demonstration, I turned on sorting for the demonstration, of course, in this case the data is already sorted.

 static void Main(string[] args) { string[] oldList = { "a", "b", "c", "d", "e" }; string[] newList = { "d", "e", "f", "g" }; Array.Sort(oldList); Array.Sort(newList); int p1 = 0; int p2 = 0; while (p1 < oldList.Length && p2 < newList.Length) { if (string.Compare(oldList[p1], newList[p2]) == 0) { Console.WriteLine("Unchanged:\t{0}", oldList[p1]); p1++; p2++; } else if (string.Compare(oldList[p1], newList[p2]) < 0) { Console.WriteLine("Removed:\t{0}", oldList[p1]); p1++; } else if (string.Compare(oldList[p1], newList[p2]) > 0) { Console.WriteLine("Added:\t\t{0}", newList[p2]); p2++; } } while (p1 < oldList.Length) { Console.WriteLine("Removed:\t{0}", oldList[p1]); p1++; } while (p2 < newList.Length) { Console.WriteLine("Added :\t\t{0}", newList[p2]); p2++; } Console.ReadKey(); } 

Exit from the above

 Removed: a Removed: b Removed: c Unchanged: d Unchanged: e Added : f Added : g 
+3
source share

You can use install operations available in Linq.

 string[] list1 = { "a","b","c","d","e"}; string[] list2 = { "d", "e", "f", "g" }; string[] newElements = list2.Except(list1).ToArray(); string[] commonElements = list2.Intersect(list1).ToArray(); string[] removedElements = list1.Except(list2).ToArray(); 

Note. The above code assumes that each of the lists is different, that is, it does not contain the same element more than once. For example, for lists [a, b, c, c] and [a, b, c], the code will not detect the deleted item.

+1
source share

Call sets X and Y. If set X supports quick searching, and you have convenient means to "tag" and "non-tagging" items in it, you can start by tagging all items in X and then query X for each item in Y. If the element is not found, the element is "new" in Y. If the element is found, it is common to both sets, and you must untie it in X. Repeat for all elements in Y. When you are done, all the elements in X that are - still labeled, have been "deleted" from Y.

This approach requires only one of the sets to support convenient queries and tags. It requires querying one set for all records in another, and then capturing from it all the elements that did not generate hits. No need to sort either set.

+1
source share

I think you are looking for given operations, i.e. association, etc. Take a look at this article: http://srtsolutions.com/public/item/251070

0
source share

All Articles