How can I compare two sets of 1000 numbers against each other?

I have to check about 1000 numbers against 1000 other numbers.

I downloaded both and compared them on the server side:

foreach( $numbers1 as $n1 ) { foreach( $numbers2 as $n2 ) { if( $n1 == $n2 ) { doBla(); } } } 

This took a lot of time, so I tried to make the same client side comparison using two hidden div . Then they were compared using JavaScript. It still takes 45 seconds to load the page (using hidden div elements).

I do not need to load numbers that do not match.

Is there a faster algorithm? I am thinking of comparing their database and just loading the error numbers and then making an Ajax call to the rest of the numbers without errors. But is there a MySQL database fast enough?

+62
javascript algorithm sql php
Oct. 15 '10 at 13:15
source share
25 answers

Sort the lists first. You can then step off both lists from the start, comparing how you are going.

The loop will look something like this:

 var A = getFirstArray().sort(), B = getSecondArray().sort(); var i = 0, j = 0; while (i < A.length && j < B.length) { if (A[i] === B[j]) { doBla(A[i]); i++; j++; } else if (A[i] < B[j]) { i++; } else j++; } 

(This JavaScript, you can do it on the server side too, but I don't know PHP.)

Edit - just to be fair to all hash table fans (whom I respect, of course), this is pretty easy to do in JavaScript:

 var map = {}; for (var i = 0; i < B.length; ++i) map[B[i]] = true; // Assume integers. for (var i = 0; i < A.length; ++i) if (map[A[i]]) doBla(A[i]); 

Or, if the numbers are or can be floats:

 var map = {}; for (var i = 0; i < B.length; ++i) map['' + B[i]] = true; // Assume integers. for (var i = 0; i < A.length; ++i) if (map['' + A[i]]) doBla(A[i]); 

Since numbers are pretty cheap for a hash (even in JavaScript, converting to a string before hashing is surprisingly cheap), it will be pretty fast.

+129
Oct. 15 2018-10-15
source share
+85
Oct. 15 2018-10-15
source share

In terms of a database, this can be a union of 1000 rows into another 1000 rows. Any modern database system can handle this.

 select x from table1 inner join table2 on table1.x = table2.y 

where table1 and table2 are the corresponding rows and can be the same table.

+27
Oct. 15 '10 at 13:18
source share

What should you do for so long - what does doBla () do? I suspect it takes time? Comparing two sets of 1,000,000 numbers with the same algorithm takes no time.

It's fun - the number of optimization methods as answers - the problem is not your algorithm - this is what doBla () does, which takes many times more times than any optimization will help you :) esp. given that the sets are only 1000 lengths, and you need to sort them first.

+26
Oct. 15 '10 at 17:26
source share

Maybe just traverse the values ​​of an array to find the numbers existing in both arrays?

 $result = array_intersect($numbers1, $numbers2); foreach ($result as $val) doBla(); 
+23
Oct. 15 '10 at 13:21
source share

If you sort List2 first, and then do a binary search for each number in List1, you will see a huge increase in speed.

I am not a PHP guy, but this should give you an idea:

 sort($numbers2); foreach($numbers1 as $n1) { if (BinarySearch($numbers2, $n1) >= 0) { doBla(); } } 

Obviously, I'm not a PHP guy, I don't know libraries, but I'm sure sorting and binary searching should be easy enough to find them.

Note. If you are not familiar with binary search; you are sorting list2 because binary search should work on sorted lists.

+9
Oct. 15 2018-10-15
source share

Sort them first.

+5
Oct. 15 '10 at 13:18
source share

I am not a PHP expert, so debugging may require some debugging, but you can do it easily in O (n) time:

 // Load one array into a hashtable, keyed by the number: O(n). $keys1 = []; foreach($numbers1 as $n1) $keys1[$n1] = true; // Find the intersections with the other array: foreach($numbers2 as $n2) { // O(n) if (isset($keys1[$n2]) { // O(1) doBla(); } } 

Despite this, the intersection is not where your time goes. Even a poor O (n ^ 2) implementation, like you now, should be able to go through 1000 numbers per second.

+5
15 Oct. '10 at 17:36
source share

Stop - why are you doing this?

If the numbers are already in the SQL database, make a connection and let the database determine the most efficient route.

If they are not in the database, then I am sure that you have gone somewhere and really need to reconsider how you got here.

+5
Oct 15 2018-10-18
source share
 $same_numbers = array_intersect($numbers1, $$numbers2); foreach($same_numbers as $n) { doBla(); } 
+4
Oct. 15 2018-10-15
source share

Sort both lists, then go through both lists at the same time using the old sequential update template of the new wizard . While you can sort the data, this is the fastest way, since you really only go to the list once to the longest length of the largest list.

+3
Oct. 15 2018-10-15
source share

Your code is simply more complicated than necessary.

Assuming what you are looking for is that the numbers at each position match (and not just that the array contains the same numbers), you can smooth your loop to unity.

 <?php // Fill two arrays with random numbers as proof. $first_array = array(1000); $second_array = array(1000); for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000); for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000); // The loop you care about. for($i=0; $i<1000; $i++) if ($first_array[$i] != $second_array[$i]) echo "Error at $i: first_array was {$first_array[$i]}, second was {$second_array[$i]}<br>"; ?> 

Using the code above, you will only execute a cycle 1000 times, unlike a cycle 1,000,000 times.

Now, if you just need to verify that the number appears or does not appear in arrays, use array_diff and array_intersect as follows:

 <?php // Fill two arrays with random numbers as proof. $first_array = array(1000); $second_array = array(1000); for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000); for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000); $matches = array_intersect($first_array, $second_array); $differences = array_diff($first_array, $second_array); ?> 
+2
Oct. 15 '10 at 14:35
source share

I may not see anything here, but it looks like a classic case of intersecting a set. Here are a few lines in perl that will do this.

foreach $ e (@a, @b) {$ union {$ e} ++ && & $ isect {$ e} ++}

@union = keys% union; @isect = keys% isect;

At the end of these lines of code, @isect will contain all the numbers that are in both the @a and @b elements. I am sure that this can be translated into php more or less directly. FWIW, this is my favorite piece of code from Cookbook Perl.

+2
Oct. 15 2018-10-15
source share

You can do this O (n) times if you use bucket sorting. Assuming you know the maximum value that numbers can take (although there are ways around this).

http://en.wikipedia.org/wiki/Bucket_sort

+2
Oct. 15 '10 at 19:46
source share

I think it would be much easier to use the built-in array_intersect function. Using your example, you can do:

 $results = array_intersect($numbers1, $numbers2); foreach($results as $rk => $rv) { doSomething($rv); } 
+1
Oct. 15 '10 at 16:13
source share

It would be best to do something like this:

 // 1. Create a hash map from one of the lists. var hm = { }; for (var i in list1) { if (!hm[list1[i]]) { hm[list1[i]] = 1; } else { hm[list1[i]] += 1; } } // 2. Lookup each element in the other list. for (var i in list2) { if (hm[list2[i]] >= 1) { for (var j = 0; j < hm[list2[i]]; ++j) { doBla(); } } } 

This is guaranteed by O (n) [assuming the hash search insertion is O (1) amortized].

Update. In the worst case, this algorithm will be O (n 2 ), and there is no way to reduce it - unless you change the semantics of the program. This is due to the fact that in the worst case, the program will call doBla () n 2 times if all the numbers in both lists match. However, if both lists have unique numbers (i.e., are generally unique in the list), then the runtime will tend to O (n).

+1
Oct. 15 2018-10-10
source share

I will create a GUI in Visual Basic, see if I can track numbers

+1
Oct. 15 '10 at 21:22
source share

Combine both lists, start at the beginning of both lists, and then search each list for the same numbers at the same time.

So, in pseudo code, it will be like ...

 Mergesort (List A); Mergesort (list B) $Apos = 0; $Bpos = 0; while( $Apos != A.Length && $Bpos != B.length) // while you have not reached the end of either list { if (A[$Apos] == B[$Bpos])// found a match doSomething(); else if (A[$Apos] > B[$Bpos]) // B is lower than A, so have B try and catch up to A. $Bpos++; else if (A[$Apos] < B[$Bpos]) // the value at A is less than the value at B, so increment B $Apos++; } 

If I am right, the speed of this algorithm is O (n logn).

+1
Oct. 15 '10 at 22:49
source share

I'm not sure why Mrk Mnl was blocked, but the function call is utility .

Pull the matching numbers into another array and doBla () on them after comparisons. As a test // out doBla () and check to see if you have the same performance issue.

+1
Oct. 15 '10 at 22:53
source share

Is it possible to put these numbers in two database tables and then make an INNER JOIN ? This will be very effective and will contain only those numbers that are contained in both tables. This is an ideal task for a database.

0
Oct. 15 '10 at 13:20
source share
  • Create two repeating collections, preferably with fast search times, such as a HashSet or possibly a TreeSet. Avoid listings as they have very poor search times.

  • When you find items, remove them from both sets. This can shorten the search time because there will be fewer items in subsequent search results.

0
Oct. 15 '10 at 19:51
source share

If you are trying to get a list of numbers without any duplicates, you can use the hash:

 $unique = array(); foreach ($list1 as $num) { $unique[$num] = $num; } foreach ($list2 as $num) { $unique[$num] = $num; } $unique = array_keys($unique); 

It will be a little (very little) slower than the array method, but it is cleaner in my opinion.

0
Oct 15 '10 at 19:53
source share

Combine, sort, and then count

 <?php $first = array('1001', '1002', '1003', '1004', '1005'); $second = array('1002', '1003', '1004', '1005', '1006'); $merged = array_merge($first, $first, $second); sort($merged); print_r(array_count_values($merged)); ?> 

Output / values ​​with a score of three are the ones you want

 Array ( [1001] => 2 [1002] => 3 [1003] => 3 [1004] => 3 [1005] => 3 [1006] => 1 ) 
0
Oct 16 2018-10-10T00:
source share

This code will call doBla() once every time the value in $numbers1 is in $numbers2 :

 // get [val => occurences, ...] for $numbers2 $counts = array_count_values($numbers2); foreach ($numbers1 as $n1) { // if $n1 occurs in $numbers2... if (isset($counts[$n1])) { // call doBla() once for each occurence for ($i=0; $i < $counts[$n1]; $i++) { doBla(); } } } 

If you only need to call doBla() only once, if a match is found:

 foreach ($numbers1 as $n1) { if (in_array($n1, $numbers2)) doBla(); } 

If $numbers1 and $numbers2 contain only unique values, or if the number of times that a particular value occurs in both arrays, it does not matter, array_intersect() will do the job:

 $dups = array_intersect($numbers1, $numbers2); foreach ($dups as $n) doBla(); 

I agree with a few earlier posts that calls to doBla() probably take longer than repeating arrays.

0
Oct 16 '10 at 9:49
source share

This problem can be divided into 2 tasks. The 1st task is to find all combinations (n ​​^ 2-n) / 2. For n = 1000, the solution is x = 499500. The second task is to sort through all the numbers x and compare them with the doBla () function.

 function getWayStr(curr) { var nextAbove = -1; for (var i = curr + 1; i < waypoints.length; ++i) { if (nextAbove == -1) { nextAbove = i; } else { wayStr.push(waypoints[i]); wayStr.push(waypoints[curr]); } } if (nextAbove != -1) { wayStr.push(waypoints[nextAbove]); getWayStr(nextAbove); wayStr.push(waypoints[curr]); } } 
0
Mar 09 2018-11-11T00:
source share



All Articles