Development of a simple analyzer

My day job includes creating a compiler like Pascal. I am constantly working on optimizing and generating code.

I would also like to start learning how to create a simple parser for the same language. However, I am not quite sure how to do this. It seems the choice is Flex and Bison. But is it possible to write a parser using C ++ or C #? I'm a little creepy with C.

YacC ++ supports C #, but it is licensed. I am looking for all the help I can find in this regard. Suggestions will be highly appreciated.

+4
source share
9 answers

I believe you can use ANTLR with C #. I have never tried this myself (yet), however there is a tutorial here that can point you in the right direction.

+6
source

Personally, I go through my lexer and parser (LL). Here is a very short example. It is in C ++, but hopefully you can adapt it. It uses the PARSE_HIGHER macro to make it easier to insert statements at different priority levels without significant code changes.

// routine to scan over whitespace/comments void ScanWhite(const char* &pc){ while(true){ if(0); else if (WHITESPACE(*pc)) pc++; else if (pc[0]=='/' && pc[1]=='/'){ while(*pc && *pc++ != '\n'); } else break; } } // routine to lex an identifier bool SeeId(const char* &pc, string sId){ ScanWhite(pc); const char* pc0 = pc; if (alpha(*pc)){ sId = ""; while(alphanum(*pc)) sId += (*pc++); return true; } pc = pc0; return false; } // routine to lex a number bool SeeNum(const char* &pc, double &dNum){ ScanWhite(pc); const char* pc0 = pc; if (digit(*pc)){ dNum = 0; while(digit(*pc)) dNum = dNum * 10 + (*pc++ - '0'); if (*pc == '.'){ double divisor = 1, frac = 0; while(digit(*pc)){ divisor *= 0.1; frac += (*pc++ - '0') * divisor; } dNum += frac; } return true; } pc = pc0; return false; } // routine to lex some constant word bool SeeWord(const char* &pc, const char* sWord){ ScanWhite(pc); const char* pc0 = pc; int len = strlen(sWord); if (strncmp(pc, sWord, len)==0 && !alphanum(pc[len])){ pc += len; return true; } pc = pc0; return false; } // routine to lex a single character like an operator bool SeeChar(const char* &pc, const char c){ ScanWhite(pc); const char* pc0 = pc; if (*pc == c){ pc++; return true; } pc = pc0; return false; } // primitive expression parser void ParsePrimitiveExpr(const char* &pc, CNode* &p){ double dNum; char sId[100]; if (0); else if (SeeNum(pc, dNum)){ p = new CNode(dNum); } else if (SeeId(pc, sId)){ // see if its a function call if (SeeChar(pc, '(')){ p = MakeNewFunctionCallNode(sId); while(!SeeChar(pc, ')')){ CNode* p1 = null; ParseExpression(pc, p1); AddArgumentExprToFunctionCallNode(p, p1); SeeChar(pc, ','); /* optional comma separator */ } } // otherwise its just a variable reference else { p = new CNode(sId); } } // handle embedded expressions else if (SeeChar(pc, '(')){ ParseExpression(pc, p); if (!SeeChar(pc, ')')) /* deal with syntax error */ } } #define PARSE_HIGHER ParsePrimitiveExpr // product parser void ParseProduct(const char* &pc, CNode* &p){ PARSE_HIGHER(pc, p); while(true){ if (0); else if (SeeChar(pc, '*')){ CNode p1 = null; PARSE_HIGHER(pc, p1); p = new CNode('*', p, p1); } else if (SeeChar(pc, '/')){ CNode p1 = null; PARSE_HIGHER(pc, p1); p = new CNode('/', p, p1); } else break; } } #undef PARSE_HIGHER #define PARSE_HIGHER ParseProduct // sum parser void ParseSum(const char* &pc, CNode* &p){ PARSE_HIGHER(pc, p); while(true){ if (0); else if (SeeChar(pc, '+')){ CNode p1 = null; PARSE_HIGHER(pc, p1); p = new CNode('+', p, p1); } else if (SeeChar(pc, '-')){ CNode p1 = null; PARSE_HIGHER(pc, p1); p = new CNode('-', p, p1); } else break; } } #undef PARSE_HIGHER // can insert more routines like the above // to handle move operators #define PARSE_HIGHER ParseSum // overall expression parser void ParseExpression(const char* &pc, CNode* &p){ PARSE_HIGHER(pc, p); } 

Added some Pascal-style instruction syntax:

 void ParseStatement(const char* &pc){ char sId[100]; if(0); else if (SeeWord(pc, "begin")){ while(!SeeWord(pc, "end")){ ParseStatement(pc); SeeChar(pc, ';'); } } else if (SeeWord(pc, "while")){ CNode* p1 = null; ParseExpression(pc, p1); ParseStatement(pc); /* semantics for while statement */ } else if (SeeWord(pc, "if")){ CNode* p1 = null; ParseExpression(pc, p1); ParseStatement(pc); if (SeeWord(pc, "else")){ ParseStatement(pc); } /* semantics for if statement */ } else if (SeeWord(pc, "for")){ /* you do it */ } // handle assignments and subroutine calls else if (SeeId(pc, sId)){ if(0); else if (SeeChar(pc, '=')){ CNode* p1 = null; ParseExpression(pc, p1); /* semantics for assignment statement */ } else if (SeeChar(pc, '(')){ CNode* p = MakeNewFunctionCallNode(sId); while(!SeeChar(pc, ')')){ CNode* p1 = null; ParseExpression(pc, p1); AddArgumentExprToFunctionCallNode(p, p1); SeeChar(pc, ','); /* optional comma separator */ } } else { /* we have a 1-word statement, which is OK in pascal */ } } else { /* syntax error */ } } 

It still needs syntax for indexing the array, declaring variables and defining a function, but I hope it is clear how to do this.

+5
source

In his classic programming text, Algorithms + Data Structures = Programs, Niklaus Wirth develops a complete recursive parser (in Pascal) for the simple language Pascal0.

+1
source

If you are writing it in Java, I would recommend ANTLR. This is a good LL (*) parser generator written in Java. There's an awesome book for Amazon.

0
source

bison and flex are canonical parser generators. If you are interested in C ++, I found boost spirit useful. I never used it for anything complicated, as a compiler. I'm sure others will have interesting suggestions for other languages ​​such as C # ...

0
source

In fact, you can use flex and bison with C ++. In this tutorial , for example, you can see that section 5 is dedicated to this issue. Just go to Google and I'm sure you will find many examples.

0
source

When you use Lex and Yacc, you don't actually write anything in C. Lex is its own language, as it is Yacc. So, you are writing a lexical analyzer in Lex and a parser in Yacc . However, for Pascal, Lex and Yacc input are already available .

As a result, the analyzer and lexer have C-interfaces, this is true. However, most languages, including C ++, have simple ways to call (or port) C-interfaces.

I am not an expert in this, but I am sure that all of the above applies to ANTLR.

If you are asking to do this in "pure C ++" (whatever that means), study the use of boost spirit . I really do not see the point in theoretical purity if it gives a ton more work.

Writing your own lexers and parsers manually is really curious. A lexer is one of the few situations where you can justify using both goto s and a preprocessor. However, I would not suggest it for a full-blown language such as Pascal, if you can avoid it. That would be a great job. I say man-years.

0
source

I wrote an XSLT parser using flex and bison. Recently, I have been doing a project using ANTLR:

Is the JFig syntax efficient and understandable (and better than Spring -Frameworks XML DSL)?

I liked working in ANTLR much more than Flex and Bison. ANTLR puts you at a higher level of abstraction in some ways. Lexical definitions and parser grammar can go in one file. (ANTLR will generate a token file.)

One of the key elements is the ability to define tree grammars. Basically, you do grammar analysis on the input language and perform actions that are written to the very optimal output of the AST tree (which remain as related data structures in memory). Then you can transfer this tree to another parser defined in a separate tree parser file. A parser tree is where you do the real work of the necessary action elements.

This is a good approach, since you can save the AST form and re-process it as needed - to clear certain nodes of the subtree for processing based on recent actions, etc. Think of something like a language interpreter. Instead of going into the for loop and processing the language from scratch over and over, it can simply handle this AST representation.

In my case, I developed a bean factory to do IoC dependency injection. My bean factory stores AST in a bean handle at runtime. When he needs to create (or get) a new bean instance, I simply pass the descriptor subtree of the bean descriptor to my tree parser - the result is the desired bean instance (there are many factors that go into determining how to instantiate the requested bean , including creating any other beans referenced and / or applying other special actions through meta attributes).

Finally, my current bean factory is targeting Java, but I want to target ActionScript3 and C # .NET. ANTLR supports this.

As already mentioned, Terrence Parr has written a book on how to use ANTLR. It is aimed at working programmers who need to do something practical with ANTLR (as opposed to an academic approach to the subject).

0
source

If your wanting C # matches this Question , try Gardens Point GPPG and GPLEX.

0
source

All Articles