How can I parse only part of a file using Perl?

Question

How can I parse only part of a file using Perl?

I'm a complete newbie to Perl, but I heard that this is great for parsing files, so I thought about giving it a spin.

I have a text file that contains the following sample information:

High school is used in some parts of the world, particularly in Scotland, North America and Oceania to describe an institution that provides all or part of secondary education. The term "high school" originated in Scotland with the world oldest being the Royal High School (Edinburgh) in 1505. The Royal High School was used as a model for the first public high school in the United States, the English High School founded in Boston, Massachusetts, in 1821. The precise stage of schooling provided by a high school differs from country to country, and may vary within the same jurisdiction. In all of New Zealand and Malaysia along with parts of Australia and Canada, high school is synonymous with secondary school, and encompasses the entire secondary stage of education. ====================================== Grade1 87.43% Grade2 84.30% Grade3 83.00% =====================================

I want to analyze the file and get only numerical information. I looked into the regex and I think I’ll use something like

 if (m/^%/) { do something } else { skip the line }

But I really want to track the variable on the left and save the numeric value in this variable. So after parsing the file, I would really like to have the following variables in order to have the% value stored in them. The reason is because I want to create a pie chart / column chart of different classes.

Score 1 = 87.43 Grade2 = 84.30

...

Could you suggest methods that I should look at?

+4

perl

c0d3rs Oct 18 '10 at 15:58

source share

5 answers

Noufal ibrahim · Answer 1 · 2010-10-18T16:05:20+0000

You will need a regular expression. Something like the following should work

 while (<>) { /(Grade[0-9]+)\s*([0-9]+\.[0-9]+)/; $op{$1} = $2; }

as a filter. The op hash will store the names and ratings. This is preferable to automatically instantiating variables.

Zaid · Answer 2 · 2010-10-18T18:42:32+0000

If you can guarantee that your points of interest are nested between two = (and this file does not indicate an odd number of these demarcations), then it’s convenient to use flip flops:

 use strict; # These two pragmas go a long, ... use warnings; # ... long way in helping you code better my %scores; # Create a hash of scores while (<>) { # The diamond operator processes all files ... # ... supplied at command-line, line-by-line next unless /^=+$/ .. /^=+$/; # The flip-flop operator used ... # ... to filter out only 'grades' my ( $name, $grade ) = split; # This usage of split will break ... # ... the current line into an array $scores{$name} = $grade; # Associate grade with name }

Cfreak · Answer 3 · 2010-10-18T16:07:23+0000

You want to use the hash. Something like this should do the trick:

 my %grades = (); # this is a hash open(my $fh, "grade_file.txt" ) or die $!; while( my $line = <$fh> ) { if( my( $name, $grade ) = $line =~ /^(Grade\d+)\s(\d+\.\d+\%) ) { $grades{$name} = $grade; } } close($fh);

Your %grades hash will contain name and class pairs. (Access to it as my $value = $grades{'Grade1'}

Also just a note. The language is called "Perl", not "PERL". Many people in the Perl community are upset about this :-)

Sinan Ünür · Answer 4 · 2010-10-18T20:25:47+0000

See Zaid's answer for an example of using a flip-flop operator (this is what I would recommend). However, if you are having difficulty with this (sometimes DWIMmery can interfere), you can also explicitly maintain state when reading a file in turn:

 #!/usr/bin/perl use strict; use warnings; my %grades; my $interesting; while ( my $line = <DATA> ) { if ( not $interesting and $line =~ /^=+\s*\z/ ) { $interesting = 1; next; } if ( $interesting ) { if ( $line =~ /^=+\s*$/ ) { $interesting = 0; next; } elsif ( my ($name, $grade) = $line =~ /^(\w+)\s+(\d+\.\d+%)/ ) { # Keep an array in case the same name occurs # multiple times push @{ $grades{$name} }, $grade; } } } use YAML; print Dump \%grades;

geoffspear · Answer 5 · 2010-10-18T16:07:19+0000

Creating dynamic variable names will probably not help you when creating a graph; using an array is almost certainly the best idea.

However, if you really think you want to do this:

 while (my $line = <$your_infile_handler>){ if ($line =~ m/(.*) = ([0-9.]*)){ $$1 = $2; } }

must accomplish this.

How can I parse only part of a file using Perl?

More articles: