When to use a parser, is regular expression enough?

I have not yet entered the field of formal languages ​​in the field of computer sciences, so perhaps my question is stupid. I am writing a simple NMEA parser in C ++ and I need to choose:

My first idea was to manually create a simple state machine, but then I thought that maybe I could do it with less work, even more efficiently. I used to use regular expressions, but I think the NMEA regular expression is very long and should match it for a long time.

Then I thought about using a parser generator. I think everyone uses the same method: they generate FSA. But I do not know which is more efficient. When do you usually use parser generators instead of regular expressions (I think you could write a regular expression in a parser generator)?

Please explain the differences; I am interested in both theory and experience.

+6
regex parser-generator
source share
3 answers

Well, a simple rule of thumb: if the grammar of the data you are trying to parse is regular , use regular expressions. If this is not the case, regular expressions may still work (since most regular expression engines also support irregular grammars), but it can be painful (complex / poor performance).

Another aspect is what you are trying to do with the data being analyzed. If you are only interested in one field, the regex is probably easier to read. If you need to read deeply nested structures, the parser will most likely be more maintainable.

+7
source share

Regex is a parser.

From Wikipedia:

Regular expressions (abbreviated as regular expressions or regular expressions, with regular expressions of multiple forms, regular expressions or regular expressions) are written in a formal language that can be interpreted by the regular expression processor, a program that either serves as a parser generator, or analyzes the text and identifies parts that match the specifications provided.

If you go to a list that you only need to delete once, save the list in a file and read it from there. If you check different things every time, use a regular expression and save the results in an array or something like that.

This is much faster than you expected. I have seen expressions more than this post.

Adding that you can invest as much as you want in any language in which you decide to encode it. You can even do this in sections to maximize their use.

+4
source share

As Sneakyness points out, you can have a large and complex regular expression that is surprisingly powerful. I saw some examples of this, but none of them could be supported by mere mortals. Even using Expresso has helped so much; it was still hard to understand and risky to change. Therefore, if you are not a scientist with fixation on Grep, I would not recommend this direction.

Instead, think about focusing on grammar and letting the compiler compiler do the heavy lifting for you.

+2
source share

All Articles