How can I parse a C header file using Perl?

I have a header file that has a large structure in it. I need to read this structure using some program, and do some operations on each member of the structure and write them back.

For example, I have a structure like

const BYTE Some_Idx[] = { 4,7,10,15,17,19,24,29, 31,32,35,45,49,51,52,54, 55,58,60,64,65,66,67,69, 70,72,76,77,81,82,83,85, 88,93,94,95,97,99,102,103, 105,106,113,115,122,124,125,126, 129,131,137,139,140,149,151,152, 153,155,158,159,160,163,165,169, 174,175,181,182,183,189,190,193, 197,201,204,206,208,210,211,212, 213,214,215,217,218,219,220,223, 225,228,230,234,236,237,240,241, 242,247,249}; 

Now I need to read this and apply some operation to each member variable and create a new structure with a different order, for example:

 const BYTE Some_Idx_Mod_mul_2[] = { 8,14,20, ... ... 484,494,498}; 

Is there any Perl library for this? If not Perl, then something else like Python is also good.

Can anyone help !!!

+6
c python header-files perl parsing
source share
9 answers

Saving data in the header makes it difficult to use other programs, such as Perl. Another approach you might consider is storing this data in a database or other file and restoring your header file as needed, possibly even within your build system. The reason for this is that generating C is much easier than parsing C, it is trivial to write a script that parses the text file and creates a header for you, and such a script can even be called from your build system.

Assuming you want to save your data in a C header file, you will need one of two tasks to solve this problem:

  • A quick one-time script to parse (or close to exactly) the input you describe.
  • A generic, well-written script that can parse arbitrary C and work in general with many different headers.

The first case seems more common than the second for me, but it’s hard to say from your question whether it is better to solve it with a script that should parse arbitrary C or scripts that should parse this particular file. For the code that works in your particular case, the following will work for me:

 #!/usr/bin/perl -w use strict; open FILE, "<header.h" or die $!; my @file = <FILE>; close FILE or die $!; my $in_block = 0; my $regex = 'Some_Idx\[\]'; my $byte_line = ''; my @byte_entries; foreach my $line (@file) { chomp $line; if ( $line =~ /$regex.*\{(.*)/ ) { $in_block = 1; my @digits = @{ match_digits($1) }; push @digits, @byte_entries; next; } if ( $in_block ) { my @digits = @{ match_digits($line) }; push @byte_entries, @digits; } if ( $line =~ /\}/ ) { $in_block = 0; } } print "const BYTE Some_Idx_Mod_mul_2[] = {\n"; print join ",", map { $_ * 2 } @byte_entries; print "};\n"; sub match_digits { my $text = shift; my @digits; while ( $text =~ /(\d+),*/g ) { push @digits, $1; } return \@digits; } 

Parsing arbitrary Cs is a bit more complicated and not worth it for many applications, but maybe you really need to do this. One trick is to let GCC do the parsing for you and read it in the GCC parsing tree using the CPAN module named GCC :: TranslationUnit . Here's the GCC command to compile the code, assuming you have one file called test.c:

gcc -fdump-translation-unit -c test.c

Here's the Perl code to read in the parse tree:

  use GCC::TranslationUnit; # echo '#include <stdio.h>' > stdio.c # gcc -fdump-translation-unit -c stdio.c $node = GCC::TranslationUnit::Parser->parsefile('stdio.c.tu')->root; # list every function/variable name while($node) { if($node->isa('GCC::Node::function_decl') or $node->isa('GCC::Node::var_decl')) { printf "%s declared in %s\n", $node->name->identifier, $node->source; } } continue { $node = $node->chain; } 
+9
source share

Sorry if this is a stupid question, but why bother with parsing a file at all? Why not write a C program that # includes the header, processes it as needed, and then spits out the source for the modified header. I am sure this would be simpler than Perl / Python solutions, and would be much more reliable, because the header would be parsed by the C compiler.

+6
source share

You really do not provide much information about how that which should be changed, but to refer to your specific example:

 $ perl -pi.bak -we'if ( /const BYTE Some_Idx/ .. /;/ ) { s/Some_Idx/Some_Idx_Mod_mul_2/g; s/(\d+)/$1 * 2/ge; }' header.h 

Interrupting this, -p talks about the passage of the input files, putting each line in $_ , running the supplied code, then printing $_ . -i.bak allows in-place editing, renaming each source file with a .bak extension, and printing to a new file with a name regardless of the original. -w includes warnings. -e '....' passes the code for each line of input. header.h is the only input file.

In the Perl code, if ( /const BYTE Some_Idx/ .. /;/ ) checks that we are in the range of lines starting with the line corresponding to /const BYTE Some_Idx/ and ending with the line corresponding to /;/ . s /.../.../ g makes the change as many times as possible. /(\d+)/ matches a series of digits. The / e flag says that the result ( $1 * 2 ) is the code that needs to be evaluated to create a replacement string instead of a simple replacement string. $ 1 are the numbers that need to be replaced.

+4
source share

If you just need to change the structure, you can directly use the regular expression to separate and apply the changes to each value in the structure looking for the declaration and ending}; know when to stop.

If you really need a more general solution, you can use a parser generator like PyParsing

+3
source share

There is a Perl module called Parse :: RecDescent , which is a very powerful recursive descent parset generator. It comes with tons of examples. One of them is a grammar that can analyze C.

Now I don’t think it matters in your case, but recursive descent partisans using Parse :: RecDescent are algorithmically slower (O (n ^ 2), I think) than tools like Parse :: Yapp or Parse: : EYapp . I did not check if Parse :: EYapp comes with such an example of C-parser, but if so, then I would recommend to study this tool.

+2
source share

Python solution (not complete, just a hint;)) Sorry if the errors are not verified

 import re text = open('your file.c').read() patt = r'(?is)(.*?{)(.*?)(}\s*;)' m = re.search(patt, text) g1, g2, g3 = m.group(1), m.group(2), m.group(3) g2 = [int(i) * 2 for i in g2.split(',') out = open('your file 2.c', 'w') out.write(g1, ','.join(g2), g3) out.close() 
+2
source share

There is a really useful Perl module called Convert :: Binary :: C that parses C header files and converts structures from / to Perl data structures.

+2
source share

You can always use pack / unpack to read and write data.

 #! /usr/bin/env perl use strict; use warnings; use autodie; my @data; { open( my $file, '<', 'Some_Idx.bin' ); local $/ = \1; # read one byte at a time while( my $byte = <$file> ){ push @data, unpack('C',$byte); } close( $file ); } print join(',', @data), "\n"; { open( my $file, '>', 'Some_Idx_Mod_mul_2.bin' ); # You have two options for my $byte( @data ){ print $file pack 'C', $byte * 2; } # or print $file pack 'C*', map { $_ * 2 } @data; close( $file ); } 
0
source share

For an example of GCC :: TranslationUnit, see hparse.pl from http://gist.github.com/395160 which will turn it into C :: DynaLib, but not yet written Ctypes. This parses the functions for FFI, not the naked structures opposite Convert :: Binary :: C. hparse will only add structures if they are used as func args.

0
source share

All Articles