How to replace macros with a parser?

I need a parser for an exotic programming language. I wrote a grammar for it and used the parser generator (PEGjs) to generate the parser. This works fine ... except for one: macros (which replace the placeholder with predefined text). I do not know how to integrate this into grammar. Let me illustrate the problem:

An example of the analyzed program usually looks as follows:

instructionA parameter1, parameter2 instructionB parameter1 instructionC parameter1, parameter2, parameter3 

There are no problems so far. But the language also supports macros:

 Define MacroX { foo, bar } instructionD parameter1, MacroX, parameter4 Define MacroY(macroParameter1, macroParameter2) { instructionE parameter1, macroParameter1 instructionF macroParameter2, MacroX } instructionG parameter1, MacroX MacroY 

Of course, I could define a grammar to identify macros and macro references. But in this case, I do not know how I analyzed the contents of the macro, because it did not understand what the macro contains. It can be only one parameter (this is the simplest), but it can also be several parameters in one macro (for example, MacroX in my example, which represents two parameters) or a whole block of instructions (for example, MacroY). Macros may contain other macros. How to put this in a grammar, if it is not clear that the macro is semantic?

The simplest approach is to start the preprocessor first to replace all macros and only then start the parser. But in this case, the line numbers are messed up. I want the parser to generate error messages containing a line number if there is a parsing error. And if I preprocess the input, the line numbers no longer match.

Help really appreciate.

+4
source share
3 answers

Macroprocessors, as a rule, do not respect the boundaries of language elements; in fact, they (often) can make arbitrary changes to the apparant input string.

If so, you have little choice: you will need to build a macro processor that can store line numbers.

If macros always contain well-structured language elements, and they always occur in structured places in the code, then you can add the concept of a macro definition and invoke a grammar. This can make your analyzes mixed; foo (x) in C code can be a macro call, or it can be a function call. You will have to somehow resolve this ambiguity. C-analyzers were used to solve such ambiguity problems by collecting symbol table information during their analysis; if you compile is-foo-a-macro during parsing, then you can determine if foo (x) is a macro call or not.

+3
source

With PEG, you must manually determine where you can check macros. You can add your macro to the hash and check it in the PEG rules that allow macros (infix expr, postfix expr, unop, binop, function call, ...). It's not as simple as in lisp, but much easier than with YACC and its senior hackers for the operator :)

Other well-known PEG frameworks that allow macros such as parrot, perl6, katahdin or PFront use the trick to perform analysis at runtime, thus trading against performance. Or you can do this and enable precompiled and interpreted PEG analysis. There are a few projects that have thought of this, but you need a fast VM, for example luajit, java, clr or friends.

I use special syntax block keywords to load external shared libraries using an external precompiled PEG analyzer. For instance. to parse SQL or FFI queries in your AST. But you can also require a C compiler and compile runtime parsing for all macros.

+1
source

With PEG, this is significantly easier than with anything else. First of all, Packrat-based parsers are the same and extensible. Your definition of a macro can change the syntax, so the next time it is used, it will be parsed naturally. See here and here for some extreme examples of this approach.

Another possibility is a chain of parsers, which is also trivial with PEG-based approaches.

0
source

All Articles