Find a suitable bracket with Regex

Input is a string representing a list of items.

The list is defined as open curly { , followed by 0 or more elements separated by spaces, followed by a closed curly } .

An element is either a literal or a list of elements.

A literal is a sequence of characters without spaces. If an element contains curly braces, it must be escaped with a backslash: \{ and \} . (Or you could assume that curls are not allowed inside literals, for simplicity)

Example:

 "{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} x\{yz \}foo }" 

There are no curls in literals:

 "{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} xyz foo }" 

(This is a simplified definition of a Tcl list.)

What I want to know: can the input be split into elements of the outermost loop using a regular expression?

Expected Result:

 abc { def ghi } 7 { 1 {2} {3 4} } {5 6} x{yz }foo 

The real question is: can this be done using regular expression?

I'm most interested in the taste of .NET, but it will accept any answers.

I will send my own guess in response and see if it is checked or destroyed.

+4
source share
4 answers

Well, editing removes the curly braces from the tokens and takes the sting out of the question, and now it's easy to do with .NET Regexes using balancing groups. This is just matching braces, which is a prime example.
As with KennyTM, this will only work if you remove the top-level curly braces or match the entire input.
Again, this is best used for recreational purposes:

 (?: # try matching... (?:\\[{}]|[^\s{}])+\s*? # a literal (allow escaped curly braces) | # OR (?<Curly>{)\s* # "{" and push to stack | # OR (?<-Curly>})\s*? # "}", pop from stack and fail if the stack is empty )+? # ...a few times, and stop whenever you can. (?(Curly)(?!)) # Make sure there aren't any extra open curly braces 

See this article for more details: Regex in-depth balancing group

+3
source

Unfortunately, the answer is YES for some regular expression flavor, for example. PCRE and .NET because they support a recursive pattern and operations similar to the stack, respectively.

Grammar can be written as

 ELEMENT -> (?!\{)\S+ | LIST LIST -> '\{\s*' ELEMENT? ('\s+' ELEMENT)* '\s*\}' 

thus in PCRE, this can be converted to a template:

  \{\s*(?0)?(?:\s+(?0))*\s*\}|(?!\{)(?:[^\s}]|\}(?![\s}]))+ # --------------------------- ^^^^^^^^^ # LIST Make sure the } is not closing the group 

See http://www.ideone.com/SnGsU (I just deleted the top level { and } ).

(Of course, don't try this at work :))

(BTW, I don't know how to convert this PCRE to a .NET flavor. If anyone knows, try Converting a recursive PCRE regular expression pattern to a .NET group definition. )

+4
source

The traditional answer to this is the resounding "NO." As we learned in the compiler class, regular grammar cannot be used to describe a language with a recursive definition (i.e. you cannot use a state machine)

Here we need a context-free parser, the implementation of which is reduced to the final destination machine + STACK.
See ANTLR , bison , etc.

+2
source

@Cristi correctly refers to a regular expression: it is theoretically impossible to solve recursive expressions using a non-contact, finite state machine. The solution, however, is simpler: you only need to save the counter of the number of open parentheses and make sure that it does not fall below 0. This saves more memory than saving the stack, and you only need the count - not the contents - of the brackets.

Algorithm:

 counter = 0 // Number of open parens For char c in string: print c if c=='{': // Keep track on number of open parens counter++ if c=='}': counter-- if counter==1: // New line if we're back to the top level print "\n" elif counter<1: // Error if the nesting is malformed print "ERROR: parentheses mismatch" break 
+1
source

All Articles