The MCE module for Perl loves large files. In MCE, you can cut multiple lines multiple times, cut a large chunk as a scalar line, or read one line at a time. Blocking many rows at once reduces the overhead for IPC.
MCE 1.504 is gone now. It provides MCE :: Queue support for child processes, including threads. In addition, release 1.5 comes with 5 models (MCE :: Flow, MCE :: Grep, MCE :: Loop, MCE :: Map and MCE :: Stream) that create an MCE instance, setting max_workers and chunk_size. You can override these options.
Below, for demonstration, use MCE :: Loop.
use MCE::Loop; print "Enter a file name: "; my $dict_path = <STDIN>; chomp($dict_path); mce_loop_f { my ($mce, $chunk_ref, $chunk_id) = @_; foreach my $line ( @$chunk_ref ) { chomp $line;
If you want to specify the number of workers and / or chunk_size, then there are 2 ways to do this.
use MCE::Loop max_workers => 5, chunk_size => 300000;
Or...
use MCE::Loop; MCE::Loop::init { max_workers => 5, chunk_size => 300000 };
Although preference is given to large files, you can compare the time with interleaving one line at a time. You can skip the first line inside the block (commented out). Note how there is no need for an inner loop. $ chunk_ref is still a ref array containing 1 row. The input scalar $ _ contains a string when chunk_size is 1, otherwise it points to $ chunk_ref.
use MCE::Loop; MCE::Loop::init { max_workers => 5, chunk_size => 1 }; print "Enter a file name: "; my $dict_path = <STDIN>; chomp($dict_path); mce_loop_f {
I hope this demo was useful for people who want to process the file in parallel.
:) mario
Mario roy
source share