Style C: Macros or Preprocessor?

I wrote a library for matching strings with a set of patterns, and now I can easily insert lexical scanners into C programs.

I know that there are many well-known tools for creating lexical scanners (lex and re2c, just to name the first two that come to mind) This question is not about lexers, but the best approach to extending Syntax C. The lexer example is just a concrete case of a common problem.

I see two possible solutions:

  • write a preprocessor that converts the source file with the built-in lexer into a simple C file and, possibly, a set of other files that will be used in compilation.
  • write a set of C macros to represent lexers in a more readable way.

I have already done both, but the question is, "which could be considered better by the following criteria?

  • readability. The logic of the lexer should be clear and understandable.
  • maintainability. Finding and fixing a bug should not be a nightmare!
  • Intervention in the assembly process. The preprocessor will require an additional step in the assembly process, the preprocessor must be in the way, etc. Etc.

In other words, if you needed to save or write a piece of software that uses one of two approaches, which will disappoint you less?

An example is a lexer for the following task:

  • Sum all numbers (can be in decimal, including exponential, for example 1.3E-4.2)
  • Skip strings (double and single quotes)
  • skip lists (similar to LISP lists: (3 4 (0 1) () 3))
  • stop at collision with the end of the word (the case does not matter) or at the end of the buffer

In two styles.

/**** SCANNER STYLE 1 (preprocessor) ****/ #include "pmx.h" t = buffer while (*t) { switch pmx(t) { /* the preprocessor will handle this */ case "&q" : /* skip strings */ break; case "&f<?=eE>&F" : /* sum numbers */ sum += atof(pmx(Start,0)); break; case "&b()": /* skip lists */ break; case "&iend" : /* stop processing */ t = ""; break; case "<.>": /* skip a char and proceed */ break; } } 

 /**** SCANNER STYLE 2 (macros) ****/ #include "pmx.h" /* There can be up to 128 tokens per scanner with id x80 to xFF */ #define TOK_STRING x81 #define TOK_NUMBER x82 #define TOK_LIST x83 #define TOK_END x84 #define TOK_CHAR x85 pmxScanner( /* pmxScanner() is a pretty complex macro */ buffer , pmxTokSet("&q" , TOK_STRING) pmxTokSet("&f<?=eE>&F" , TOK_NUMBER) pmxTokSet("&b()" , TOK_LIST) pmxTokSet("&iend" , TOK_END) pmxTokSet("<.>" , TOK_CHAR) , pmxTokCase(TOK_STRING) : /* skip strings */ continue; pmxTokCase(TOK_NUMBER) : /* sum numbers */ sum += atof(pmxTokStart(0)); continue; pmxTokCase(TOK_LIST): /* skip lists */ continue; pmxTokCase(TOK_END) : /* stop processing */ break; pmxTokCase(TOK_CHAR) : /* skip a char and proceed */ continue; ); 

If anyone is interested in the current implementation, the code is here: http://sites.google.com/site/clibutl .

+4
source share
2 answers

The preprocessor will offer a more reliable and versatile solution. Macros, on the other hand, whisk quickly, provide good proof of concept, and easily when the sample keyword / token space is small. Scaling / enabling new features can become tedious with macros after the dot. I would say to learn macros to get started, and then convert them to your preprocessor commands.

Also, try using a common preprocessor rather than writing your own if possible.

[...] I would have other dependencies to handle (for example, m4 for Windows).

Yes. But so will any decision you write :) - and you need to support it. Most of the programs you named have an available Windows port (for example, see m4 for windows ). The advantages of using such a solution is to save a lot of time. Of course, the disadvantage is that you probably have to speed up with the source code if and when an odd error occurs (although the people who support them are very helpful and will certainly have any help).

And again, yes, I would prefer a packaged solution to collapse my own.

+6
source

A custom preprocessor is a typical approach in parser / interpreter generators, since macro capabilities are very limited and offer potential problems during the expansion phase, which allows you to debug enormous efforts.

I suggest you use a time-tested tool, such as the classic Yacc / Lex Unix programs, or if you want to "extend" C, use C ++ and Boost :: spirit, a parser generator that makes extensive use of templates.

+3
source

All Articles