Will you implement a lightweight XML parser with <regex>?

Question

Will you implement a lightweight XML parser with <regex>?

If you needed to implement an easy XML parser, would you decide to use a regular expression?

Parsing XML in my case would be the easiest: only tags and textual content. No namespaces, no attributes, no schema support (at first, but maybe ...).

I think it would be a good exercise for me to learn the new C ++ 0x & ltge regex library. However, I was wondering if the XML parsing would be higher than the valid regex values.

+4

c ++ xml regex c ++ 11

Stephane rolling Nov 08 '10 at 9:34

source share

4 answers

If I had to do this, I would use a real lexer / parser generator, such as flex / yacc. Yes, it works harder to get started, but once you pay for this installation cost, adding support for additional features is much easier. In addition, flex and yacc have been optimized for decades, so they will generate much faster code than anything you write manually.

+4

Chris Nov 08 '10 at 9:48

source share

I wrote a lightweight XML parser and I did not use a regex. This is very easy to do if you are using only a subset of XML. Just read the XML character by character and update the status using boolean elements (e.g. in_a_tag). This is faster than everything you do with the regex, and you don’t have to deal with the line or memory problem (try matching a line? An entire document? What if there are several elements on one line? What if the tag is at 2 lines?)

+1

Damien Nov 08 '10 at 12:14

source share

If you really need to parse XML: don't do it to get a real XML parser.

If you just want the exercise to gain some experience with the new C ++ 0x regular expression library: try to find the best and useful project. First you need something that can be used later (see above for real XML parsing). However, there are even worse ways to learn the regex library. :)

+1

Roger Pate Nov 08 '10 at 17:37

source share

Oliver Charlesworth · Accepted Answer · 2010-11-08T09:50:24+0000

In a word: no. XML is not a common language.

UPDATE (To expand, based on the discussion in the comments below)

XML is not regular, so you cannot hope to use regular expressions to perform any one action with parsing / splitting into the whole file / line.

While you can write a state-based analyzer that uses regular expressions to perform lexing / tokenization, IMHO this will be less efficient and more error prone than using a tool designed for the job. As others have said, Flex / Bison is one option.

Will you implement a lightweight XML parser with <regex>?

More articles: