Does sorting cope with grep performance in Perl

I am looking for some features of the Perl grep function. I'm doing it:

 if ( grep{ $foo == $_ } @bar ) { some code; } 

Suppose @bar large (hundreds of thousands of elements). With my data, if I sort @bar , $foo values ​​are more likely to appear near the beginning of the array than closer to the end. I am wondering if this will help performance.

In a different way, with the code above, grep moves sequentially through @bar , checking to see if there is $foo == $_ , and then exits immediately as soon as any value has been found to be true? Or does he really check every @bar element before returning a value?

+7
source share
5 answers

grep doesn't close, so the order of the elements doesn't matter.

While List :: MoreUtils first performs a short circuit, the entire list must be pushed on the stack before it is called.

It will be better:

 for (@bar) { if ($foo == $_) { some code; last; } } 

Updated . I initially iterated over the indices, since it uses O (1) memory, but also for (@bar) (as opposed to for (LIST) in general), since ysth reminded me.

+10
source

Since your use of grep is in a scalar context, it returns the number of matching elements. To calculate this, Perl must visit every element in any case, so sorting is unlikely to help performance from this angle.

+6
source

In your example, grep will iterate over the entire array no matter how many elements are matched.

If you can keep this array sorted, it's faster to look for your values ​​using binary search. You can also convert your array to a hash (with key = array elements) and perform your checks with constant time, but this will consume additional memory.

+2
source

Regarding your question

According to my data, if I sort @bar, the values ​​of $ foo will most likely appear near the beginning of the array rather than towards the end. I am wondering if this will help performance.

If the list is sorted digitally, you can write

 sub contains { my ($list, $item) = @_; for (@$item) { return $_ == $item if $_ >= $item; } return !1; } some_code() if contains(\@bar, $foo); 
+1
source

It depends. A grep { $x == $_ } @a does not use branch prediction, but grep { $x < $_ } @a does!

 #!/usr/bin/env perl use strict; use warnings; use Time::HiRes qw( clock_gettime CLOCK_MONOTONIC ); use constant MIN => 0; use constant MAX => 1000; use constant AVG => int(MIN + (MAX - MIN) / 2); use constant N_LOOPS => 40000; use constant ARRAY_LEN => 10000; ## is grep faster for sorted arrays? ## ## RANDOM ARRAY VALUES ## my $n = 0; my @a = map { int(rand() * (MAX - MIN) + MIN) } 1 .. ARRAY_LEN; my $duration = -clock_gettime ( CLOCK_MONOTONIC ); for(my $i = 0; $i < N_LOOPS; $i++) { $n += grep { AVG < $_ } @a; } $duration += clock_gettime ( CLOCK_MONOTONIC ); print "duration: $duration secs, n = $n".$/; ## ## PREDICTABLE ARRAY VALUES ## $n = 0; @a = sort {$a <=> $b} @a; $duration = -clock_gettime ( CLOCK_MONOTONIC ); for(my $i = 0; $i < N_LOOPS; $i++) { $n += grep { AVG < $_ } @a; } $duration += clock_gettime ( CLOCK_MONOTONIC ); print "duration: $duration secs, n = $n".$/; ## and now we try to eliminate side effects by repeating ## ## RANDOM ARRAY VALUES ## $n = 0; @a = map { int(rand() * (MAX - MIN) + MIN) } 1 .. ARRAY_LEN; $duration = -clock_gettime ( CLOCK_MONOTONIC ); for(my $i = 0; $i < N_LOOPS; $i++) { $n += grep { AVG < $_ } @a; } $duration += clock_gettime ( CLOCK_MONOTONIC ); print "duration: $duration secs, n = $n".$/; ## ## PREDICTABLE ARRAY VALUES ## $n = 0; @a = sort {$a <=> $b} @a; $duration = -clock_gettime ( CLOCK_MONOTONIC ); for(my $i = 0; $i < N_LOOPS; $i++) { $n += grep { AVG < $_ } @a; } $duration += clock_gettime ( CLOCK_MONOTONIC ); print "duration: $duration secs, n = $n".$/; 

Results:

 duration: 27.7465513650095 secs, n = 199880000 <-- unsorted duration: 26.129752348992 secs, n = 199880000 <-- sorted duration: 28.3962040760089 secs, n = 202920000 <-- unsorted duration: 26.082420132996 secs, n = 202920000 <-- sorted 

See also Why is a sorted array processed faster than an unsorted array?

0
source

All Articles