Grep or splicing on a large array

I have a large hash array, about 0.5 GB, stored in memory, and I need to remove some elements from it, distribute about 10% throughout the array.

What is the best way to work, do grep or identify the elements that need to be removed and connect them?

Thanks,

Simone

+4
source share
4 answers

splice can go to O (n ^ 2) under the conditions you described (since it moves the contents of the array around), and grep / slice will allocate additional O (n) memory (probably much less than 500 GB, but still ...) .

There is a linear solution without additional memory, but it looks more like C than Perl:

 sub inplace_grep { my ($code, $array) = @_; # move elements backwards for (my ($to, $from)=(0,0); $from < @$array; $from++) { $code->($array->[$from]) or next; $array->[$to++] = $array->[$from]; }; # remove tail splice @$array, $to; }; 

Update: when using grep memory - you can quickly test the additional memory allocation using large amounts of data and is looking for syscall brk . On my system (linux, perl 5.10) it is.

 strace -e trace=brk perl -MTime::HiRes -wle \ 'print "start ".time; my @array = 1..10**7; print "alloc ".time; @array = grep { $_ %2 } @array; print "grep ".time' 
0
source

Rate it? I would suggest that knowing what your data looks like, grep will be faster than a few splicing calls for an array with lots of elements.

+5
source

If you know which elements you want to keep , you can simply index them using an array:

 @want = @all[ @wanted ]; 

or

 @all = @all[ @wanted ]; 

As for which of grep and splicing is the fastest, splice will be the fastest, since all he needs to do is move some pointers to C and delete things that you no longer save from memory, grep will need a bit more work since each member of the source list requires a call to your select function.

+2
source

If you already know which items you want to remove, this will by definition be faster, since you will not need to search for them, delete them. Otherwise, grep is the best choice for a quick filtering method.

0
source

All Articles