How to read lines from the end of a file in Perl?

I am working on a Perl script to read a CSV file and do some calculations. A CSV file has only two columns, something like below.

One Two 1.00 44.000 3.00 55.000 

Now this CSV file is very large, maybe from 10 MB to 2 GB.

I am currently taking a 700 MB CSV file. I tried to open this file in notepad, but it looks like no software will open it.

I want to read the last 1000 lines from a CSV file and see the values. How can i do this? I can not open the file in notepad or in any other program.

If I write a Perl script, then I need to process the complete file to go to the end of the file and then read the last 1000 lines.

Is there a better way? I am new to Perl and any suggestions would be appreciated.

I searched the web and there are some scripts available as File :: Tail, but I don’t know that they will work on Windows?

+6
perl large-files
source share
11 answers

In * nix, you can use the tail command.

 tail -1000 yourfile | perl ... 

This will only write the last 1000 lines to the perl program.

Windows has gnuwin32 and unxutils packages have a tail utility.

+11
source share

The File :: ReadBackwards module allows you to read a file in the reverse order. This makes it easy to get the last N lines as long as you are not dependent on the order. If you and the required data are small enough (which should be in your case), you can read the last 1000 lines in the array, and then reverse .

+25
source share

This applies only to your main question, but if you want to check whether a module such as the File :: Tail platform works on your computer, check the results of CPAN Testers . The links at the top of the module page in CPAN Search lead you to

file-tail-header

Looking at the matrix, you see that indeed this module has problems in Windows on all versions of Perl:

file-tail-matrix

+8
source share

I wrote a quick reverse file lookup using the following pure Perl code:

 #!/usr/bin/perl use warnings; use strict; my ($file, $num_of_lines) = @ARGV; my $count = 0; my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward my $offset = -2; # skip two last characters: \n and ^Z in the end of file open F, $file or die "Can't read $file: $!\n"; while (abs($offset) < $filesize) { my $line = ""; # we need to check the start of the file for seek in mode "2" # as it continues to output data in revers order even when out of file range reached while (abs($offset) < $filesize) { seek F, $offset, 2; # because of negative $offset & "2" - it will seek backward $offset -= 1; # move back the counter my $char = getc F; last if $char eq "\n"; # catch the whole line if reached $line = $char . $line; # otherwise we have next character for current line } # got the next line! print $line, "\n"; # exit the loop if we are done $count++; last if $count > $num_of_lines; } 

and run this script like:

 $ get-x-lines-from-end.pl ./myhugefile.log 200 
+5
source share

Without a tail, a Perl-only solution is not unreasonable.

One way is to search from the end of the file and then read the lines from it. If you do not have enough lines, look even further from the end and try again.

 sub last_x_lines { my ($filename, $lineswanted) = @_; my ($line, $filesize, $seekpos, $numread, @lines); open F, $filename or die "Can't read $filename: $!\n"; $filesize = -s $filename; $seekpos = 50 * $lineswanted; $numread = 0; while ($numread < $lineswanted) { @lines = (); $numread = 0; seek(F, $filesize - $seekpos, 0); <F> if $seekpos < $filesize; # Discard probably fragmentary line while (defined($line = <F>)) { push @lines, $line; shift @lines if ++$numread > $lineswanted; } if ($numread < $lineswanted) { # We didn't get enough lines. Double the amount of space to read from next time. if ($seekpos >= $filesize) { die "There aren't even $lineswanted lines in $filename - I got $numread\n"; } $seekpos *= 2; $seekpos = $filesize if $seekpos >= $filesize; } } close F; return @lines; } 

PS The best heading would be something like "Reading lines from the end of a large file in Perl."

+4
source share
 perl -n -e "shift @d if (@d >= 1000); push(@d, $_); END { print @d }" < bigfile.csv 

Although in fact the fact that UNIX systems can simply tail -n 1000 should convince you to simply install cygwin or colinux

+2
source share

You can use Tie :: File module, I suppose. It looks like this loads the rows into an array, then you can get the size of the array and process arrayS-ze-1000 to arraySize-1.

Tie :: File

Another option would be to count the number of lines in the file, then skip the file once and start reading the values ​​on numberofLines-1000

 $count = `wc -l < $file`; die "wc failed: $?" if $?; chomp($count); 

This will give you the number of rows (on most systems.

+1
source share

If you know the number of lines in a file, you can do

 perl -ne "print if ($. > N);" filename.csv 

where N is $ num_lines_in_file is $ num_lines_to_print. You can count the lines with

 perl -e "while (<>) {} print $.;" filename.csv 
0
source share

You should absolutely use File :: Tail or better yet another module. This is not a script, this is a module (programming library). It probably works on Windows. As someone said, you can check this out on CPAN Testers, or often just reading the module documentation or just trying it.

You have chosen to use the tail utility as your preferred answer, but most likely it will be more of a headache on Windows than File :: Tail.

0
source share

Modules are the way to go. However, sometimes you can write a piece of code that you want to run on different machines, which may not be available in the more obscure CPAN modules. In that case, why not just β€œtail” and output the output to a temporary file from Perl?

 #!/usr/bin/perl `tail --lines=1000 /path/myfile.txt > tempfile.txt` 

After installation, you have something that does not depend on the CPAN module.

0
source share

Without relying on the tail, which I'll probably do if you have more than $ FILESIZE [2GB?] Of memory, then I will just be lazy and do:

 my @lines = <>; my @lastKlines = @lines[-1000,-1]; 

Although other answers involving tail or seek() greatly contributes to this.

-one
source share

All Articles