Perl hexadecimal code dump parsing

I have a hexadecimal dump of a message in a file that I want to receive in an array so I can execute the decoding logic on it.
I was wondering if this were an easier way to parse a message similar to this.

37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69

Please note that data can be no more than 16 bytes on any line. But any string can contain fewer bytes (minimum: 1)
Is there a nice and elegant way, and not read 2 characters in perl?

+4
source share
3 answers

Perl has a hex statement that does the decoding logic for you.

hex EXPR

hex

Interprets EXPR as the sixth string and returns the corresponding value. (To convert lines that can begin with 0 , 0x or 0b , see oct .) If EXPR is omitted, uses $_ .

 print hex '0xAf'; # prints '175' print hex 'aF'; # same 

Remember that the default split behavior breaks the line in space separators, for example

  $ perl -le '$ _ = "abc";  print for split '
 a
 b
 c 

For each line of input, divide it into hexadecimal values, convert the values ​​to numbers and push them into an array for later processing.

 #! /usr/bin/perl use warnings; use strict; my @values; while (<>) { push @values => map hex($_), split; } # for example my $sum = 0; $sum += $_ for @values; print $sum, "\n"; 

Run Example:

  $ ./sumhex mtanish-input 
 4196 
+5
source

I would read the line at a time, split the spaces and use pack 'H*' to convert it. It is difficult to be more specific without knowing what kind of "decoding logic" you are trying to apply. For example, here is the version that converts each byte to decimal:

 while (<>) { s/\s+//g; my @bytes = unpack('C*', pack('H*', $_)); print "@bytes\n"; } 

Output from sample file:

 55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53 59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21 108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0 0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105 116 101 108 99 111 114 100 105 
+6
source

I think reading in two characters at a time is a good way to parse a stream whose logical markers are two-character units.

Is there any reason why you find this ugly?

If you are trying to extract a specific sequence, you can do this using asymmetric regular expressions.

0
source

All Articles