What is better in Perl: an array of hash links or a list of flat hashes?

Question

What is better in Perl: an array of hash links or a list of flat hashes?

I can’t decide which approach is greater than (1) idiomatic Perl, (2) effective or (3) “clear”.

Let me explain by code. First i can do

sub something { ... $ref->{size} = 10; $ref->{name} = "Foo"; $ref->{volume} = 100; push (@references, $ref); ... return @references; }

or, I can do

 sub something { ... push (@names, "Foo"); $sizes{Foo} = 10; $volumes{Foo} = 100; ... return (\@names, \%sizes, \%volumes); }

Both do essentially the same thing. The important thing is that I need an array, because I need to keep order.

I know there are always several ways to do something, but still, which one would you prefer?

+4

data-structures perl

Karel bílek Jul 24 '09 at 19:33

source share

4 answers

I really prefer the first. It stores one “packet” of data (size, name, volume) and makes much more readable code.

+5

Thomas Jul 24 '09 at 19:36

source share

Store related data together. The only reason to create large parallel arrays is because you are forced to.

If you are concerned about speed and memory usage, you can use constant array indexes to access your named fields:

 use constant { SIZE => 0, NAME => 1, VOLUME => 2, }; sub something { ... $ref->[SIZE] = 10; $ref->[NAME] = "Foo"; $ref->[VOLUME] = 100; push @references, $ref; ... return @references; }

I also added some spaces to make the code more readable.

If I have many parameters with validation rules and / or deep data structures, I tend to look at objects to simplify my code by linking data logic to data. Of course, OOP requires a speed penalty, but I rarely saw this as a problem.

For quick and dirty OOP, I use Class :: Struct, which has many drawbacks. In situations where I need type checking, I use Moose or Mouse (when memory or startup speed is very important).

+2

daotoad Jul 24 '09 at 10:52

source share

Both methods can be useful for different problems. If you are always going to get all the information together, just keep it together. For example, in your case, you want to track the name, title and size of a web page. You are probably working with all three of these things at the same time, so keep them together as an array of hash links.

In other cases, you can break the data into different things that you use separately and want to search independently of other columns. In these cases, individual hashes may make sense.

0

brian d foy Jul 25 '09 at 19:33

source share

Sinan Ünür · Accepted Answer · 2009-07-24T19:38:53+0000

Instead of thinking in meaningless terms such as something , think and state the problem in concrete terms. In this case, you seem to be returning a list of objects with the attributes name , size and volume . When you think of it this way, there is no reason to even consider the second method.

You may consider optimizing later if you encounter problems, but even if you do, you will probably get more from Memoize than by exploding data structures.

One performance improvement I recommend returning the link from this routine:

 sub get_objects { my @ret; while ( 'some condition' ) { # should I return this one? push @ret, { name => 'Foo', size => 10, volume => 100, }; } return \@ret; }

What is better in Perl: an array of hash links or a list of flat hashes?

More articles: