Recommendations for Implementing C Regular Expression Analyzer

I am thinking of implementing a regular expression parser in the C library that I am developing. Now the question is: is there any open source code that I could use verbatim or with minimal modifications? My expectations regarding the code:

  • it needs to be written in C (not C ++)
  • it must be compiled under gcc, mingw, M $ VC
  • it should not depend on third-party or OS-specific headers / libraries (i.e. everything needed to compile it should be easily accessible with the basic installation of gcc, mingw, M $ VC
  • it would be nice if he used Perl compatible regex syntax (like PCRE in PHP).
  • ideally, the code should be as compact as possible

Are there any ready-made solutions that you could recommend? I looked at PCRE for C, and it looks like it has everything that is available in PHP (which rules), but the size (1.4MB DL) is a little intimidating. Do you think this is a solid bet? Or are there other options worth considering?

[EDIT]

The library I'm developing is an open source BSD license.

+6
c regex parsing recommendation-engine
source share
4 answers

PCRE is pretty much the de facto standard for implementing regular expressions (for good reason). Don’t worry about the size, it’s big, because the implementation of regular expressions is complicated. Just use it anyway.

+2
source share

PCRE is so large that regular expressions are complex. And in most cases, this is the documentation and support code; it is much smaller when compiled into object code.

+7
source share

RE2 , Google's regexp implementation matches linear time ( O(n) if n is the length of a string), PCRE and most other regular expressions work in exponential time in the worst case. Another noteworthy O(n) regexp matcher is flex , but it needs all possible regular expressions at compile time. If you are looking for something smaller than PCRE, look at the regular expression match in busybox or pattern match in lua .

+6
source share

You can try TRE if you are happy with the regex POSIX syntax. If you need Perl syntax, Google has a new version worth checking out.

+3
source share

All Articles