The most efficient algorithm for combining two ordered lists

I need to find a union of 2 descending ordered lists (list1 and list2), where the union will be each item from both lists without duplicates. Suppose the list items are integers. I am using large O notation to determine the most efficient algorithm to solve this problem. I know big O for the 1st, but I don't know the big O-notation for the second. Can someone tell me the big notation of the second algorithm, so I can decide which algorithm to implement? If someone knows a better algorithm than one of them, could you help me understand this? Thanks in advance.

Here are my two algorithms. . . ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Algorithm #1: O(N * log base2 N) Starting at the first element of list1, while(list1 is not at the end of the list) { if(the current element in list1 is not in list2) // Binary Search -> O(log base2 N) add the current element in list1 to list2 go to the next element in list1 } list2 is now the union of the 2 lists ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Algorithm #2: O(?) Starting at the first elements of each list, LOOP_START: compare the current elements of the lists whichever element is greater, put into a 3rd list called list3 go to the next element in the list whose element was just inserted into list3 branch to LOOP_START until either list1 or list2 are at the end of their respective list insert the remaining elements from either list1 or list2 into list3 (the union) list3 now contains the union of list1 and list2 
+7
source share
8 answers

Here is my assessment of the situation.

  • Your first algorithm works in n log n : you do a binary search for each item in the first list, right?
  • The second algorithm is not fully completed: you are not saying what to do if the items in the two lists are equal. However, given the correct logic for working with equal elements, your second algorithm is similar to the merge part of a merge sort : it will work in linear time (i.e. N ). This is optimal, in a sense, that you cannot do better than this: you cannot combine two ordered lists without looking at any element in both lists at least once.
+8
source

The second is O (n + m), and the first is O (n log (m) + m). Thus, the second is much better.

+8
source

Using the following algorithm, you can combine two lists in O (n + m).

[Sorry, I used python for simplicity, but the algorithm is the same in every language]

Note that the algorithm also supports items sorted in the result list.

 def merge(list1, list2): result = [] i1 = 0; i2 = 0; #iterate over the two lists while i1 < len(list1) and i2 < len(list2): #if the current items are equal, add just one and go to the next two items if list1[i1] == list2[i2]: result.append(list1[i1]) i1 += 1 i2 += 1 #if the item of list1 is greater than the item of list2, add it and go to next item of list1 elif list1[i1] > list2[i2]: result.append(list1[i1]) i1 += 1 #if the item of list2 is greater than the item of list1, add it and go to next item of list2 else: result.append(list2[i2]) i2 += 1 #Add the remaining items of list1 while i1 < len(list1): result.append(list1[i1]) i1 += 1 #Add the remaining items of list2 while i2 < len(list2): result.append(list2[i2]) i2 += 1 return result print merge([10,8,5,1],[12,11,7,5,2]) 

Output:

 [12, 11, 10, 8, 7, 5, 2, 1] 
+1
source

Difficulty Analysis:

Let's say that the length of list 1 is N , and list 2 is M

Algorithm 1:
If I could seem incredible, I would agree that, in my opinion, the complexity of this algorithm as such is N * M , not NlogM .

For each element in list 1 (O(N)) we look for it in list 2 (O(logM) . The complexity of this algorithm seems to be O(NlogM) .

However, we also insert the item into list 2 . This new item must be inserted in the right place so that list 2 remains sorted for further binary searches. If we use an array as a data structure, then inserting will take O(M) time.

Therefore, the complexity order is O(N*M) for the algorithm as it is.

Modification can be performed when a new element is inserted at the end of list 2 (the list is no longer ordered), and we perform the binary search operation from index 0 to M-1 rather than new size-1 . In this case, the complexity should be O(N*logM) , since we will conduct binary searches of N in a list of length M

To reorder the list, we will need to combine the two ordered parts (from 0 to M-1 and M to newSize-1). This can be done in O (N + M) time (one merge operation in the merge mode of an N + M array). Therefore, the net-time complexity of this algorithm should be

 O(NlogM + N + M) 

The complexity of the space O(max(N,M)) does not take into account the initial space of lists and takes into account the additional space required in list 2.

Algorithm 2:
At each iteration, we move at least 1 pointer forward . The total distance to both pointers is N + M Therefore, the worst case time complexity order is O(N+M) , which is better than the 1st algorithm.

However, the space complexity required in this case is greater than ( O(N+M) ).

0
source

Here's another approach: Go through both lists and paste all the values ​​into a set. This will remove all duplicates, and the result will be the merging of the two lists. Two important notes: you will lose the order of the numbers. In addition, additional space is required.

Difficulty of time: O (n + m)

Cosmic complexity: O (n + m)

If you need to maintain the order of the result set, use some custom version of LinkedHashMap.

0
source

Actually, algorithm 2 should not work if input lists are not sorted. To sort the array, order O (m * lg (m) + n * lg (n))

You can create a hash table in the first list, then for each element from the second list you will check if this element exists in the hash table. This works in O (m + n).

0
source

There are a few things to point out:

  • Are there duplicates in the input lists?
  • Should a result be ordered?

I assume that using std::list you can cheaply insert into the head or tail.

Suppose that there are N elements in list 1 and there are M elements in list 2.


Algorithm 1

It iterates over each element of list 1 that searches for it in list 2.

Assuming there may be duplicates and that a result should be ordered, the worst time to search is that list 2 does not have an element in list 1, therefore it is at least:

  • O (N Γ— M).

To insert an element of list 1 at the desired location, you need to repeat list 2 again to the insertion point. The worst case scenario is when each item in list 1 is less (if the search in list 2 starts from the very beginning) or more (if the search in list 2 is performed from the end). Since the previous elements of list 1 were inserted into list 2, for the first element there would be iterations M, M + 1 for the second, M + 2 for the third, etc. And M + N - 1 iterations for the latter, for the average value of M + (N - 1) / 2 per element.

Something like:

  • N Γ— (M + (N - 1) / 2)

To denote a large O, constant factors do not matter, therefore:

  • N Γ— (M + (N - 1))

For recordings with large conclusions, unimportant additions do not matter, therefore:

  • O (N Γ— (M + N))

Addition to the original O (N Γ— M):

  • O (N Γ— M) + O (N Γ— (M + N))
  • O (N Γ— M) + O (N Γ— M + N 2 )

The second equation is to make obvious the elimination of a constant factor, for example. 2 Γ— (N Γ— M), thus:

  • O (N Γ— (M + N))
  • O (N 2 + N Γ— M)

These are the two equivalents that you like the most.

Possible optimization:

  • If the result does not need to be ordered, the insert can be O (1), so the worst case of time is:

    • O (N Γ— M)

  • Do not just check every element of List 1 in list 2 on an equal basis, check if every element, for example. more so that you can stop searching in list 2 if the item in list 1 is larger than the item in list 2; it would not reduce the worst case, but it would reduce the average case
  • Keep a List 2 iterator that indicates that the List 1 element was found more than the List 2 element to make the sorted insert O (1); when inserting, be sure to save the iterator that begins with the inserted element, because although list 1 is ordered, it may contain duplicates; with these two, the worst case of time:

    • O (N Γ— M)

  • For the following iterations, find the List 1 element in the rest of List 2 with the iterator saved; this reduces the worst case, because if you reach the end of list 2, you will simply β€œdelete” duplicates from list 1; with these three, the worst case of time:

    • O (N + M)

At this point, the only difference between this algorithm and Algorithm 2 is that List 2 is modified to contain the result, rather than creating a new list.


Algorithm 2

This is a merge sort merge.

You will move each element of list 1 and each element of list 2 once, and the insert is always performed at the beginning or at the end of the list, so the worst time is:

  • O (N + M)

If there are duplicates, they are simply discarded. The result is easier to streamline than not.


Concluding observations

If there are no duplicates, in both cases the insert can be optimized. For example, with doubly linked lists, we can easily check whether the last item in list 1 is larger than the first item in list 2 or vice versa, and simply combine the lists.

This can be further generalized for any tail of list 1 and list 2. For example, in Algorithm 1, if the List 1 element is not found in list 2, we can combine list 2 and the tail of list 1. In Algorithm 2, this is done in the last step.

The worst case is when the elements of list 1 and the elements of list 2 alternate, do not decrease, but again the average case decreases, and in many cases a big factor that is of great importance in Real Life β„’.

I ignored:

  • Distribution time
  • Heavy gaps in algorithms
  • Binary search because you mentioned lists , not arrays or trees

I hope I have not made a blatant mistake.

0
source

I implemented an implementation based on typescript (js) of Union from 2 object arrays in one of my previous projects. The data was too large, and the default libraries, such as underscores or lodash, were not optimistic. After some brainhunting, I came up with a binary search based algorithm. Hope this helps someone in tuning performance.

Regarding complexity, the algorithm is based on binary search and will end in O (log (N)).

Basically, the code takes two unordered arrays of objects and a key name for comparison and: 1) sort the arrays 2) iterate each element of the first array and delete it in the second array 3) combine the resulting second array into the first array.

  private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => { function comparer(a, b) { if (a[propertyName] < b[propertyName]) return -1; if (a[propertyName] > b[propertyName]) return 1; return 0; } arr1.sort(comparer); arr2.sort(comparer); } private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => { this.sortArrays(arr1, arr2, propertyName); var self = this; for (var i = 0; i < arr1.length; i++) { var obj = { loc: 0 }; if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName)) arr2.splice(obj.loc, 1); } return arr2; } private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => { var first, mid, last; var count; first = 0; last = size - 1; count = 0; if (!arr.length) return false; while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) { mid = first + Math.floor((last - first) / 2); if (val[propertyName] == arr[mid][propertyName]) { obj.loc = mid; return true; } else if (val[propertyName] < arr[mid][propertyName]) last = mid - 1; else first = mid + 1; } return false; } private UnionAll = (arr1, arr2, propertyName): Array<Object> => { return arr1.concat(this.difference(arr1, arr2, propertyName)); } //example var YourFirstArray = [{x:1},{x:2},{x:3}] var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}] var keyName = "x"; this.UnionAll(YourFirstArray, YourSecondArray, keyName) 
0
source

All Articles