Recently, I had to parse several log files about 6 gigabytes in size. Buffering was a problem because Perl would happily try to read these 6 gigabytes in memory when I assign STDIN to an array ... However, I just didn't have the system resources available for this. I came up with the following workaround, which simply reads the file line by line and thus avoids the massive black hole buffering buffer, which otherwise could manage all my system resources.
Note. All these scripts are divided into 6 gigabyte files into several smaller ones (the size of which is determined by the number of lines that should be contained in each output file). An interesting bit is the while loop and the assignment of one line from the variable's log file. The loop will iterate over the entire file, looking at one line, doing something with it, and then repeating. Result, lack of massive buffering ... I kept the entire script intact to show a working example ...
#!/usr/bin/perl -w BEGIN{$ENV{'POSIXLY_CORRECT'} = 1;} use v5.14; use Getopt::Long qw(:config no_ignore_case); my $input = ''; my $output = ''; my $lines = 0; GetOptions('i=s' => \$input, 'o=s' => \$output, 'l=i' => \$lines); open FI, '<', $input; my $count = 0; my $count_file = 1; while($count < $lines){ my $line = <FI>;
Script is invoked on the command line, for example:
(script name) -i (input file) -o (output file) -l (output file size (i.e. number of lines)
Even if this is not quite what you are looking for, I hope this gives you some ideas. :)
source share