Pimp my Perl code

I am an experienced developer, but not in Perl. I usually learn Perl to crack a script, then I will forget it again until next time. Therefore, I seek advice from professionals.

This time I am building a series of data analysis scripts. In general, the simplified structure of the program is as follows:

01 my $config_var = 999; 03 my $result_var = 0; 05 foreach my $file (@files) { 06 open(my $fh, $file); 07 while (<$fh>) { 08 &analyzeLine($_); 09 } 10 } 12 print "$result_var\n"; 14 sub analyzeLine ($) { 15 my $line = shift(@_); 16 $result_var = $result_var + calculatedStuff; 17 } 

In real life, there are about a dozen different config_var and result_var s.

These scripts differ mainly in the values โ€‹โ€‹assigned to config_var s. The main loop will be the same in each case, and analyzeLine() will be basically the same, but may have slight variations.

I can accomplish my task by making N copies of this code with slight modifications here and there; but this grossly violates all the rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by

 do theCommonStuff; 

Note that config_var (and its siblings) must be available for shared code, as well as result_var and its images, on which analyzeLine() performs some calculations.

Should I pack my "generic" code in a module? Create a class? Use global variables?

While not exactly golf for golf, I am looking for a simple, compact solution that will allow me to DRY and write code just for differences. I think that I would prefer not to output the code from a huge table containing all the configs, and, of course, not adapt it to use the database.

We look forward to your suggestions and thanks!


Update

Since people were asking, here is the real analyzeLine :

 # Update stats with time and call data in one line. sub processLine ($) { my $line = shift(@_); return unless $line =~ m/$log_match/; # print "$1 $2\n"; my ($minute, $function) = ($1, $2); $startMinute = $minute if not $startMinute; $endMinute = $minute; if ($minute eq $currentMinute) { $minuteCount = $minuteCount + 1; } else { if ($minuteCount > $topMinuteCount) { $topMinute = $currentMinute; $topMinuteCount = $minuteCount; printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount); } $totalMinutes = $totalMinutes + 1; $totalCount = $totalCount + $minuteCount; $currentMinute = $minute; $minuteCount = 1; } } 

Since these variables are largely interdependent, I think that a functional solution with separate calculations will not be practical. I apologize for the misleading people.

+4
source share
5 answers

Go ahead and create a class hierarchy. Your task is the ideal platform for OOP programming. Here is an example:

 package Common; sub new{ my $class=shift; my $this=bless{},$class; $this->init(); return $this; } sub init{} sub theCommonStuff(){ my $this=shift; for(1..10){ $this->analyzeLine($_); } } sub analyzeLine(){ my($this,$line) =@ _; $this->{'result'}.=$line; } package Special1; our @ISA=qw/Common/; sub init{ my $this=shift; $this->{'sep'}=','; # special param: separator } sub analyzeLine(){ # modified logic my($this,$line) =@ _; $this->{'result'}.=$line.$this->{'sep'}; } package main; my $c = new Common; my $s = new Special1; $c->theCommonStuff; $s->theCommonStuff; print $c->{'result'}."\n"; print $s->{'result'}."\n"; 
+6
source

Two comments: first, do not send line numbers, as they make copy, paste, and editing difficult. Secondly, do not use &func() to call sub. See perldoc perlsub :

A routine can be called using the explicit & prefix. & is optional in modern Perl, but not only does the & form make the argument list optional, but it also disables prototype validation against the arguments you provide.

In short, using & can be unexpected if you donโ€™t know what you are doing and why you are doing it.

Also, do not use prototypes in Perl. They do not match prototypes in other languages โ€‹โ€‹and, again, can have unexpected effects very much if you do not know what you are doing.

Remember to check the return value of system calls, such as open . Use autodie with modern perl s.

For your specific task, collect all the configuration variables in a hash. Pass this hash to analyzeLine .

 #!/usr/bin/perl use warnings; use strict; use autodie; my %config = ( frobnicate => 'yes', machinate => 'no', ); my $result; $result += analyze_file(\%config, $_) for @ARGV; print "Result = $result\n"; sub analyze_file { my ($config, $file) = @_; my $result; open my $fh, '<', $file; while ( my $line = <$fh> ) { $result += analyze_line($config, $line); } close $fh; return $result; } sub analyze_line { my ($line) = @_; return length $line; } 

Of course, you'll notice that $config is passed all over the place, which means you can include this in your OO solution:

 #!/usr/bin/perl package My::Analyzer; use strict; use warnings; use base 'Class::Accessor::Faster'; __PACKAGE__->follow_best_practice; __PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) ); sub analyze_file { my $self = shift; my ($file) = @_; my $result; open my $fh, '<', $file; while ( my $line = <$fh> ) { $result += $self->analyze_line($line); } close $fh; return $result; } sub analyze_line { my $self = shift; my ($line) = @_; return $self->get_analyzer->($line); } package main; use warnings; use strict; use autodie; my $x = My::Analyzer->new; $x->set_analyzer(sub { my $length; $length += length $_ for @_; return $length; }); $x->set_frobnicate('yes'); $x->set_machinate('no'); my $result; $result += $x->analyze_file($_) for @ARGV; print "Result = $result\n"; 
+10
source

If all the common code is in one function, the function that uses your configuration variables as parameters returns the result variables (either as return values โ€‹โ€‹or as input / output parameters). Otherwise, creating a class ("package") is also a good idea.

 sub common_func { my ($config, $result) = @_; # ... $result->{foo} += do_stuff($config->{bar}); # ... } 

Please note that both the config and the result are hashes (in fact, links to them). You can use any other data structure that you think will fit your purpose.

+2
source

Some thoughts:

  • If there are several $result_var s, I would recommend creating a separate routine to calculate each of them.
  • If a routine uses information outside this function, it should be passed as a parameter to that routine, and not rely on global status.
  • Alternatively wrap it all in a class, $result_var as an attribute of the class.

Practically speaking, there are several ways to implement this:

(1) Restore the function &analyzeLine calculatedStuff and add it to &result_var in a loop outside the function:

  $result_var = 0; foreach my $file (@files) { open(my $fh, $file); while (<$fh>) { $result_var += analyzeLine($_); } } } sub analyzeLine ($) { my $line = shift(@_); return calculatedStuff; } 

(2) Explicitly pass $result_var to analyzeLine and return the modified $result_var .

  $result_var = 0; foreach my $file (@files) { open(my $fh, $file); while (<$fh>) { $result_var = addLineToResult($result_var, $_); } } } sub addLineToResult ($$) { my $running_total = shift(@_); my $line = shift(@_); return $running_total + calculatedStuff; } 

The important part is that if you allocate functions for each of several $ result_vars, you can more readily write clean code. Do not worry about optimization. This may happen later when your code is slow. Improved design will simplify optimization when the time comes.

+2
source

why not create a function and use $ config_var and $ result_var as parameters?

0
source

All Articles