Are there regex mechanisms that provide the visibility of what they do?

In every programming language I have worked with, regular expression support (if one exists) is basically a black box: there are some functions, such as match , scan , etc. that take an expression and return that often a string or an array, but they don’t report what they do when they do it.

I am wondering if any reasonably popular programming language has built-in or library support for matching regular expressions and providing some kind of real-time output (for example, for standard output) that indicates what is happening.

Update : I appreciate comments so far; however, I am not asking about a tool that displays the structure of the regular expression itself, which debuggex.com and regexper.com seem to do (although it is very cool!). I wanted to ask about providing information during the part where the expression applies to some input.

Here is a hypothetical example: suppose I had the expression "(foo | bar | baz)" and I check this against the string "baz"; then I present a conclusion that may look like ...

 testing "foo" - nope testing "bar" - nope testing "baz" - found match 

Obviously, this would have looked completely different; but you get this idea.

+8
regex
source share
4 answers

Several regular expression libraries are written in such a way that you can get the state from the processing status information. In particular, Russ Cox wrote an article on regular expressions that included bits of code and an API to transition state by state:

http://swtch.com/~rsc/regexp/regexp1.html

The code used in the article has been expanded into a complete, regular library of regular expressions, which seems to give step-by-step output similar to what you described:

https://code.google.com/p/re1/

Later, the code was more fully developed and is now a full-blown regex library that is supported (and used internally) by Google:

https://code.google.com/p/re2/

EDIT

If you compile re2 with DebugDFA set to true in the source code, during processing you will get the status as output. However, for many regular expressions, it may not match 1-1 with the actual regular expression, and the output is a little esoteric.

+6
source share

The Python regex engine provides visibility using the RE.debug flag . You are asking for something else though (real-time feedback), which I am sure does not exist. I could see that it is integrated into the IDE or an extended python shell such as ipython . In my opinion, it would be fun to write and very useful.

+4
source share

Regexbuddy

Although this is not a programming language, the JGSoft RegexBuddy utility has a built-in regular expression debugger that shows every step (including every return path) the regular expression engine performs when applied to a given target string. I use this tool to measure and compare the effectiveness of various expressions. It is also very convenient for defining unrestrained expressions (i.e. catastrophic rollback ).

+2
source share

This is not an exact answer to what you ask for, but is related.

If you plan on doing arbitrary calculations using callbacks while evaluating a string (for example, the compiler can create an abstract syntax tree when parsing the source code), you can use parsing and vocabulary tools in almost any popular language. Many of them use regular expressions to determine the grammars they will take, and will be more appropriate for handling complex grammars (definitely overflows for the example you gave, though).

+1
source share

All Articles