Trying to understand array_diff_uassoc optimization

It seems that the arrays are sorted before comparing each other inside array_diff_uassoc .

What is the advantage of this approach?

Test script

function compare($a, $b) { echo("$a : $b\n"); return strcmp($a, $b); } $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('v' => 1, 'w' => 2, 'x' => 3, 'y' => 4, 'z' => 5); var_dump(array_diff_uassoc($a, $b, 'compare')); $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('d' => 1, 'e' => 2, 'f' => 3, 'g' => 4, 'h' => 5); var_dump(array_diff_uassoc($a, $b, 'compare')); $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); var_dump(array_diff_uassoc($a, $b, 'compare')); $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('e' => 5, 'd' => 4, 'c' => 3, 'b' => 2, 'a' => 1); var_dump(array_diff_uassoc($a, $b, 'compare')); 

http://3v4l.org/DKgms#v526

PS it seems that the sorting algorithm has changed in php7.

+5
source share
2 answers

The sorting algorithm has not changed in PHP 7. Elements are simply passed in a different order to the sorting algorithm for some performance improvements.

Well, an advantage can be ultimately faster execution. You really end up in the worst case where both arrays have completely different keys.

In the worst case, the difficulty is to sort the arrays twice, and then compare each key from two arrays. O(n*m + n * log(n) + m * log(m))

The best case is to sort twice and then have as many comparisons as there are elements in a smaller array. O(min(m, n) + n * log(n) + m * log(m))

In case of coincidence, you no longer have to compare with the full array, but only with the key after the match.

But in the current implementation, sorting is simply redundant. I think the implementation in php-src needs to be improved. There is no direct mistake, but the implementation is simply bad. If you understand some C: http://lxr.php.net/xref/PHP_TRUNK/ext/standard/array.c#php_array_diff (Please note that this function is called via php_array_diff(INTERNAL_FUNCTION_PARAM_PASSTHRU, DIFF_ASSOC, DIFF_COMP_DATA_INTERNAL, DIFF_COMP_KEY_USER); array_diff_uassoc )

+4
source

Theory

Sorting allows you to make several shortcuts; eg:

 A | B -------+------ 1,2,3 | 4,5,6 

Each element of A will be compared only with B [0], since the other elements are known to be no less large.

Another example:

 A | B -------+------- 4,5,6 | 1,2,6 

In this case, A [0] is compared with all elements of B, but A [1] and A [2] are compared only with B [2].

If any element from A is larger than all elements in B, you will get the worst performance.

Practice

While the above works well for standard array_diff() or array_udiff() , as soon as the key comparison function is used, it will use O (n * m) performance due to this change , trying to fix this error .

The above error describes how custom key comparison functions can cause unexpected results when used with arrays that have mixed keys (i.e. numeric and string values). I personally believe that this should have been considered with the help of documentation, because you would get equally strange results with ksort() .

+3
source

Source: https://habr.com/ru/post/1214632/


All Articles