I wrote a library for matching strings with a set of patterns, and now I can easily insert lexical scanners into C programs.
I know that there are many well-known tools for creating lexical scanners (lex and re2c, just to name the first two that come to mind) This question is not about lexers, but the best approach to extending Syntax C. The lexer example is just a concrete case of a common problem.
I see two possible solutions:
- write a preprocessor that converts the source file with the built-in lexer into a simple C file and, possibly, a set of other files that will be used in compilation.
- write a set of C macros to represent lexers in a more readable way.
I have already done both, but the question is, "which could be considered better by the following criteria?
- readability. The logic of the lexer should be clear and understandable.
- maintainability. Finding and fixing a bug should not be a nightmare!
- Intervention in the assembly process. The preprocessor will require an additional step in the assembly process, the preprocessor must be in the way, etc. Etc.
In other words, if you needed to save or write a piece of software that uses one of two approaches, which will disappoint you less?
An example is a lexer for the following task:
- Sum all numbers (can be in decimal, including exponential, for example 1.3E-4.2)
- Skip strings (double and single quotes)
- skip lists (similar to LISP lists: (3 4 (0 1) () 3))
- stop at collision with the end of the word (the case does not matter) or at the end of the buffer
In two styles.
#include "pmx.h" t = buffer while (*t) { switch pmx(t) { case "&q" : break; case "&f<?=eE>&F" : sum += atof(pmx(Start,0)); break; case "&b()": break; case "&iend" : t = ""; break; case "<.>": break; } }
#include "pmx.h" #define TOK_STRING x81 #define TOK_NUMBER x82 #define TOK_LIST x83 #define TOK_END x84 #define TOK_CHAR x85 pmxScanner( buffer , pmxTokSet("&q" , TOK_STRING) pmxTokSet("&f<?=eE>&F" , TOK_NUMBER) pmxTokSet("&b()" , TOK_LIST) pmxTokSet("&iend" , TOK_END) pmxTokSet("<.>" , TOK_CHAR) , pmxTokCase(TOK_STRING) : continue; pmxTokCase(TOK_NUMBER) : sum += atof(pmxTokStart(0)); continue; pmxTokCase(TOK_LIST): continue; pmxTokCase(TOK_END) : break; pmxTokCase(TOK_CHAR) : continue; );
If anyone is interested in the current implementation, the code is here: http://sites.google.com/site/clibutl .
source share