How to write a parser in C #?

Question

How to write a parser in C #?

How do I start writing a parser (recursive descent?) In C #? For now, I just need a simple parser that parses arithmetic expressions (and reads variables?). Although later I intend to write an xml and html parser (for training purposes). I do this because of the wide range of materials that parsers are useful in: web development, programming language translators, home tools, game engines, map and tile editors, etc. So what is the main theory for writing parsers and how do I implement one in C #? Is C # the right language for parsers (I once wrote a simple arithmetic parser in C ++ and was efficient. Will JIT compilation be equally good?). Any useful resources and articles. And best of all, code examples (or links to code examples).

Note. Out of curiosity, has anyone who answers this question ever implemented a parser in C #?

+53

c # parsing xml-parsing interpreter

ApprenticeHacker Sep 11 2018-11-11T00:

source share

7 answers

Sprache is a powerful but lightweight framework for writing parsers in .NET. There is also a Sprache NuGet package . To give you an idea of the structure here, this is one of samples that can parse a simple arithmetic expression into a .NET expression tree. Pretty surprisingly what I would say.

using System; using System.Linq.Expressions; using Sprache; namespace LinqyCalculator { static class ExpressionParser { public static Expression<Func<decimal>> ParseExpression(string text) { return Lambda.Parse(text); } static Parser<ExpressionType> Operator(string op, ExpressionType opType) { return Parse.String(op).Token().Return(opType); } static readonly Parser<ExpressionType> Add = Operator("+", ExpressionType.AddChecked); static readonly Parser<ExpressionType> Subtract = Operator("-", ExpressionType.SubtractChecked); static readonly Parser<ExpressionType> Multiply = Operator("*", ExpressionType.MultiplyChecked); static readonly Parser<ExpressionType> Divide = Operator("/", ExpressionType.Divide); static readonly Parser<Expression> Constant = (from d in Parse.Decimal.Token() select (Expression)Expression.Constant(decimal.Parse(d))).Named("number"); static readonly Parser<Expression> Factor = ((from lparen in Parse.Char('(') from expr in Parse.Ref(() => Expr) from rparen in Parse.Char(')') select expr).Named("expression") .XOr(Constant)).Token(); static readonly Parser<Expression> Term = Parse.ChainOperator(Multiply.Or(Divide), Factor, Expression.MakeBinary); static readonly Parser<Expression> Expr = Parse.ChainOperator(Add.Or(Subtract), Term, Expression.MakeBinary); static readonly Parser<Expression<Func<decimal>>> Lambda = Expr.End().Select(body => Expression.Lambda<Func<decimal>>(body)); } }

+15

Martin Liversage Nov 05

source share

C # is almost a decent functional language, so there’s not much to implement something like Parsec in it. Here is one example of how to do this: http://jparsec.codehaus.org/NParsec+Tutorial

It is also possible to implement a Packrat combination, but this time to keep the global syntax state somewhere instead of doing pure functional stuff. In my (very simple and ad hoc) implementation, it was fast enough, but, of course, a code generator such as this should work better.

+3

SK-logic Sep 11 '11 at 18:55

source share

I know I'm a little late, but I just published a library of parser / grammar / AST generators called Ve Parser. you can find it at http://veparser.codeplex.com or add it to your project by typing "Install-Package veparser" in the package manager console. This library is a kind of recursive descent parser that is designed to be easy to use and flexible. Since its source is available to you, you can find out its source. Hope this helps.

+2

Sam Oct. 16 2018-11-11T00:

source share

In my opinion, there is a better way to implement parsers than traditional methods, which lead to simpler and more understandable code, and especially simplifies the extension of any language that you play by simply connecting a new class to a very object-oriented one. One article in a larger series that I wrote focuses on this parsing method, and the full source code is included for the C # 2.0 analyzer: http://www.codeproject.com/Articles/492466/Object-Oriented-Parsing-Breaking-With -Tradition-Pa

+1

Ken Beckett Aug 17 '13 at 16:11

source share

Well ... where to start from this ....

First of all, write a parser, well, that is a very broad expression, especially with the question you ask.

Your opening expression was that you need a simple arithmetic "parser", and technically it is not a parser, it is a lexical analyzer similar to what you can use to create a new language. ( http://en.wikipedia.org/wiki/Lexical_analysis ) I understand exactly where the confusion may be that they are one and the same. It is important to note that Lexical analysis is ALSO that you want to understand if you are going to write language / script parses too, this does not strictly analyze, because you interpret the instructions, and not use them.

Back to the parsing issue ....

This is what you will do if you take a rigidly defined file structure to extract information from it.

In general, you really do not need to write a parser for XML / HTML, because there is already a lot of it, and especially if your XML parsing is created at runtime .NET, then you don’t even need to parse, you just need to "serialize" and de-serialize.

In the interest of learning, however, parsing XML (or something like html) is very simple in most cases.

if we start with the following XML:

  <movies> <movie id="1"> <name>Tron</name> </movie> <movie id="2"> <name>Tron Legacy</name> </movie> <movies>

we can load the data into XElement as follows:

  XElement myXML = XElement.Load("mymovies.xml");

you can get 'movies in the root element using ' myXML.Root '

MOre interesting however, you can easily use Linq to get nested tags:

  var myElements = from p in myXML.Root.Elements("movie") select p;

Provides you with var XElements, each of which contains one "...", which you can use when using somthing like:

  foreach(var v in myElements) { Console.WriteLine(string.Format("ID {0} = {1}",(int)v.Attributes["id"],(string)v.Element("movie")); }

For anything other than XML, like a data structure, I’m afraid that you will have to start learning the art of regular expressions, a tool like “Regular Expression Coach” will help you in truth ( http://weitz.de/regex -coach / ) or one of the most commonly used similar tools.

You will also need to become familiar with .NET regex objects ( http://www.codeproject.com/KB/dotnet/regextutorial.aspx ) should give you a good start.

As soon as you know how your reg-ex file works, in most cases it is a simple case of reading in files one line at a time and with an understanding of what method you feel with.

A good free source of file formats for everything you can imagine can be found at ( http://www.wotsit.org/ )

0

shawty Sep 11 2018-11-11T00:

source share

For the record, I implemented the parser generator in C # only because I could not find a job that would be normal or similar to YACC (see http://sourceforge.net/projects/naivelangtools/ ).

However, after some experience with ANTLR, I decided to go with LALR instead of LL. I know that it is theoretically easier to implement LL (generator or parser), but I just can not live with a stack of expressions to express the priorities of the operators (for example, * goes to + in "2 + 5 * 3"). In LL, you say that mult_expr is built into add_expr, which does not seem natural to me.

0

greenoldman Oct 20 '13 at 19:54

source share

Jonathan Dickinson · Accepted Answer

I have implemented several parsers in C # - a handwritten and generated tool.

A very good introductory parsing tutorial in general - Let Build the Compiler - demonstrates how to create a recursive descent parser; and the concepts are easy to translate from his language (I think it was Pascal) to C # for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser manually.

You should learn some code generation tools for you - if you decide to write a classic recursive descent parser ( TinyPG , Coco / R , Irony ). Keep in mind that there are now other ways to write parsers that usually work better — and have simpler definitions (like TDOP parsing or Monadic Parsing ).

On the question of whether C # is suitable for the task, C # has some of the best text libraries. Many parsers today (in other languages) have obscene code for working with Unicode, etc. I will not comment too much on the JIT code because it can become quite religious - however you should be fine. IronJS is a good example of a parser / runtime in the CLR (although it is written in F #), and its performance is simply shy. V8.

Side note: Markup parsers are completely different animals compared to language parsers - they are in most cases written by hand - and at the scanner / parser level are very simple; they are usually not recursive descending - and especially in the case of XML it is better if you do not write a recursive descent parser (to avoid, and also because the "flat" parser can be used in SAX / push mode).

How to write a parser in C #?

More articles: