Why do I use so much memory when I read a file in memory in Perl?

I have a 310 MB text file (uncompressed). When using PerlIO :: gzip to open a file and unzip it into memory, this file easily fills 2 GB of RAM before Perl runs out of memory.

The file opens as shown below:

open FOO, "<:gzip", "file.gz" or die $!;
my @lines = <FOO>;

Obviously, this is a super convenient way to easily open gzipped files in perl, but it takes up ridiculous space! My next step is to unzip the file in HD, read the lines of the file on @lines, work with @lines and compress it. Does anyone have any idea why 7 times more memory is consumed when opening an archived file? Does anyone have an alternative idea on how I can unzip this gzipped file into memory without taking on a ridiculous amount of memory?

+5
source share
3 answers

When you do:

my @lines = <FOO>;

, file. 100 3,4 . , , , .

. :

C:\Temp> dir file
2010/10/04  09:18 PM       328,000,000 file
C:\Temp> dir file.gz
2010/10/04  09:19 PM         1,112,975 file.gz

, ,

#!/usr/bin/perl

use strict; use warnings;
use autodie;
use PerlIO::gzip;

open my $foo, '<:gzip', 'file.gz';

while ( my $line = <$foo> ) {
    print ".";
}

.

, :

#!/usr/bin/perl

use strict; use warnings;
use Devel::Size qw( total_size );

my $x = 'x' x 100;
my @x = ('x' x 100);

printf "Scalar: %d\n", total_size( \$x );
printf "Array:  %d\n", total_size( \@x );

:

Scalar: 136
Array:  256
+17

@lines. , . , :

open my $foo, '<:gzip', 'file.gz' or die $!;
while (my $line = <$fh>) {
    # process $line here
}
+22

With such large files, I see only one solution: you can use the command line to unpack / compress the file. Do your manipulations in Perl, then use external tools to compress / decompress the file again :)

-6
source

All Articles