Regex with a recursive expression to match nested braces?

I am trying to match text, for example sp { ...{...}... } , where braces are allowed to the socket. This is what I still have:

 my $regex = qr/ ( #save $1 sp\s+ #start Soar production ( #save $2 \{ #opening brace [^{}]* #anything but braces \} #closing brace | (?1) #or nested braces )+ #0 or more ) /x; 

I just can't get it to match the following text: sp { { word } } . Can anyone see what is wrong with my regex?

+7
source share
2 answers

There are many problems. The recursive bit should be:

 ( (?: \{ (?-1) \} | [^{}]+ )* ) 

Together:

 my $regex = qr/ sp\s+ \{ ( (?: \{ (?-1) \} | [^{}]++ )* ) \} /x; print "$1\n" if 'sp { { word } }' =~ /($regex)/; 
+6
source

This is the case for the underutilized Text::Balanced , a very convenient core module for this kind of thing. It relies on the pos beginning of the delimited sequence, which is first defined / set, so I usually call it like this:

 #!/usr/bin/env perl use strict; use warnings; use Text::Balanced 'extract_bracketed'; sub get_bracketed { my $str = shift; # seek to beginning of bracket return undef unless $str =~ /(sp\s+)(?={)/gc; # store the prefix my $prefix = $1; # get everything from the start brace to the matching end brace my ($bracketed) = extract_bracketed( $str, '{}'); # no closing brace found return undef unless $bracketed; # return the whole match return $prefix . $bracketed; } my $str = 'sp { { word } }'; print get_bracketed $str; 

A regular expression with the gc modifier tells the line where the endpoint of the match is located, and extract_bracketed uses this information to know where to start.

+5
source

All Articles