How to open an array of files in Perl?

In perl, I read files from a directory, and I want to open them all at the same time (but line by line) so that I can execute a function that uses all their nth lines together (e.g. concatenation).

my $text = `ls | grep ".txt"`; my @temps = split(/\n/,$text); my @files; for my $i (0..$#temps) { my $file; open($file,"<",$temps[$i]); push(@files,$file); } my $concat; for my $i (0..$#files) { my @blah = <$files[$i]>; $concat.=$blah; } print $concat; 

I am just a bunch of errors, using an uninitialized value and GLOB (..) errors. So how can I do this job?

+2
file perl simultaneous
Sep 30 '09 at 17:52
source share
4 answers

Many problems. Starting with the call to "ls | grep" :)

Let's start with the code:

First, let's get a list of files:

 my @files = glob( '*.txt' ); 

But it would be better to check if the given name is associated with a file or directory:

 my @files = grep { -f } glob( '*.txt' ); 

Now open these files to read them:

 my @fhs = map { open my $fh, '<', $_; $fh } @files; 

But we need a way to handle errors - in my opinion, the best way to add:

 use autodie; 

At the beginning of the script (and installing autodie, if you don't already have one). Alternatively, you can:

 use Fatal qw( open ); 

Now, if we have this, let's get the first line (as shown in your example) from all the inputs and connect it:

 my $concatenated = ''; for my $fh ( @fhs ) { my $line = <$fh>; $concatenated .= $line; } 

It is beautiful and readable, but you can still reduce it by preserving (in my opinion) readability to:

 my $concatenated = join '', map { scalar <$_> } @fhs; 

The effect is the same - $ concatenated contains the first lines of all files.

So, the whole program will look like this:

 #!/usr/bin/perl use strict; use warnings; use autodie; # use Fatal qw( open ); # uncomment if you don't have autodie my @files = grep { -f } glob( '*.txt' ); my @fhs = map { open my $fh, '<', $_; $fh } @files; my $concatenated = join '', map { scalar <$_> } @fhs; 

Now, perhaps you want to combine not only the first lines, but all. In this situation, instead of the code $concatenated = ... you need something like this:

 my $concatenated = ''; while (my $fh = shift @fhs) { my $line = <$fh>; if ( defined $line ) { push @fhs, $fh; $concatenated .= $line; } else { close $fh; } } 
+15
Sep 30 '09 at 18:29
source share

Here is your problem:

 for my $i (0..$#files) { my @blah = <$files[$i]>; $concat .= $blah; } 

Firstly, <$files[$i]> not a valid file read. This is the source of errors for your GLOB (...). See mobrule's answer why this is so. Therefore, change this to the following:

 for my $file (@files) { my @blah = <$file>; $concat .= $blah; } 

The second problem. You mix @blah (an array named blah ) and $blah (a scalar named blah ). This is the source of the errors of your "uninitialized value" - $blah (scalar) was not initialized, but you use it. If you want the $n nth line from @blah , use this:

 for my $file (@files) { my @blah = <$file>; $concat .= $blah[$n]; } 

I do not want to continue beating the dead horse, but I want to turn to a better way to do something:

 my $text = `ls | grep ".txt"`; my @temps = split(/\n/,$text); 

This reads a list of all files in the current directory with the extension ".txt". This works and is efficient, but it can be quite slow - we need to call a shell that is forced to run ls and grep , and this does a bit of overhead. In addition, ls and grep are simple and common programs, but not quite portable. Of course, there is a better way to do this:

 my @temps; opendir(DIRHANDLE, "."); while(my $file = readdir(DIRHANDLE)) { push @temps, $file if $file =~ /\.txt/; } 

Simple, short, clean Perl, without forks, without portable wrappers, and we do not need to read in a line and then split it - we can only store the records that we really need. In addition, it becomes trivial to change the conditions for files that pass the test. Let's say we accidentally read the file test.txt.gz because our regular expression matches: we can easily change this line to:

  push @temps, $file if $file =~ /\.txt$/; 

We can do this with grep (I suppose), but why settle for grep limited regular expressions when Perl has one of the most powerful regular expression libraries built in somewhere?

+8
Sep 30 '09 at 18:01
source share

Use parentheses around $files[$i] inside the <> operator

 my @blah = <{$files[$i]}> 

Otherwise, Perl interprets <> as a glob file statement instead of a read-from-filehandle statement.

+1
Sep 30 '09 at 18:03
source share

You already have good answers. Another way to solve the problem is to create a list of lists containing all the lines from the files ( @content ). Then use the each_arrayref function from List :: MoreUtils , which will create an iterator that will give line 1 from all the files, then line 2, etc.

 use strict; use warnings; use List::MoreUtils qw(each_arrayref); my @content = map { open(my $fh, '<', $_) or die $!; [<$fh>] } grep {-f} glob '*.txt' ; my $iterator = each_arrayref @content; while (my @nth_lines = $iterator->()){ # Do stuff with @nth_lines; } 
+1
Sep 30 '09 at 18:54
source share



All Articles