Should Perl hashes always contain values?

I had an earlier question that received the following answer from a famous Perl expert, Perl author and Perl trainer brian d foy:

[If] you are looking for a fixed sequence of characters at the end of each file name. You want to know if this fixed sequence is on the list of sequences you are interested in. Save all extensions in a hash and look in this hash:
     my ($ extension) = $ filename = ~ m /\.([^.†+)$/;
     if (exists $ hash {$ extension}) {...}
You do not need to create a regular expression, and you do not need to go through several possible alternations of regular expressions to check every extension that you need to learn.

Thanks for the advice brian.

What I want to know now is what works best for a case like the one above. Should keys be defined only what I need to achieve what is described above, or should I always determine the value?

+7
perl hash
source share
6 answers

It is usually preferable to set a specific value for each key. The idiomatic meaning (when you don't care about the meaning) is 1.

my %hash = map { $_ => 1 } @array; 

Executing this method makes the code more convenient than a hash, because you can use $hash{key} as a boolean. If the value can be undefined, you need to use the more detailed exists $hash{key} .

However, there are situations where an undef value is desired. For example: imagine you are processing C header files to extract preprocessor characters. It would be logical to store them in a hash of the name => value pairs.

 #define FOO 1 #define BAR 

In Perl, this will map to:

 my %symbols = ( FOO => 1, BAR => undef); 

In C a #define defines a character, not a value - "defined" in C maps to "exists" in Perl.

+5
source share

You cannot create a hash key without a value. The value may be undef, but it will be there. How else have you built the hash. Or was your question as to whether the value could be undef? In this case, I would say that the value you store there (undef, 1, 0 ...) is entirely up to you. If many people use it, then you probably want to keep some true value, but if someone else uses if ($ hash {$ extension}) {...} instead of the existing one, because they did not pay attention.

+4
source share

undef is the value.

Of course, such things always depend on what you are doing now. But $foo{bar} is just a variable like $bar , and I see no reason why each of them should not be undef from time to time.

PS: That's why exists .

+3
source share

As others have said, an idiomatic solution for a hash (a hash containing only keys, not values) is to use 1 as the value, as this makes testing for existence simple. However, there is something to be said about using undef as a value. This will force users to test existence with exists , which is slightly faster. Of course, you can check for existence with exists , even when the value is 1 and avoid inevitable errors for users who forget to use exists .

+3
source share

Saving "1" in Set-hash is considered harmful

I know that using it is considered harmful is considered harmful , but it is bad; almost as bad as the rampant use of goto .

Ok, I pondered this a bit in a few comments, but I think I need a complete answer to demonstrate this problem.

Suppose we have a daemon process that provides inventory management for a store that sells widgets.

 my @items = qw( widget thingy whozit whatsit ); my @items_in_stock = qw( widget thingy ); my %in_stock; my @in_stock(@items_in_stock) = (1) x @items_in_stock; #initialize all keys to 1 sub Process_Request { my $request = shift; if( $request eq REORDER ) { Reorder_Items(\@items, \%in_stock); } else { Error_Response( ILLEGAL_REQUEST ); } } sub Reorder_Items{ my $items = shift; my $in_stock = shift; # Order items we do not have in-stock. for my $item ( @$items ) { Reorder_Item( $item ) if not exists $in_stock->{$item}; } } 

The tool is wonderful, it automatically stores items in stock. Very well. Now the boss requests automatically created catalogs of goods in the warehouse. Therefore, we modify Process_Request() and add directory generation.

 sub Process_Request { my $request = shift; if( $request eq REORDER ) { Reorder_Items(\@items, \%in_stock); } if( $request eq CATALOG ) { Build_Catalog(\@items, \%in_stock); } else { Error_Response( ILLEGAL_REQUEST ); } } sub Build_Catalog { my $items = shift; my $in_stock = shift; my $catalog_response = ''; foreach my $item ( @$items ) { $catalog_response .= Catalog_Item($item) if $in_stock->{$item}; } return $catalog_response; } 

When testing, Build_Catalog () works fine. Hooray, we go live with the app.

Unfortunately. For some reason nothing is ordered, the company runs out of everything.

The Build_Catalog() routine adds keys to %in_stock , so Reorder_Items() now sees everything as it is in stock and never makes an order.

Using Hash :: Util lock_hash can help prevent accidental hash modifications. If we blocked %in_stock before calling 'Build_Catalog () `, we would get a fatal error and never go with the error.

Thus, it is better to check for the presence of keys, rather than the truth of your set-hash values. If you use existence as a signifier, do not set your values ​​to "1" because this will mask the errors and make it difficult to track them. Using lock_hash can help catch these issues.

If you must verify the validity of the values, do it in each case.

+1
source share

Using undef as a value in a hash has more memory than saving 1.

+1
source share

All Articles