Is it possible to rewrite regular expressions containing nongreedy (reluctant) quantifiers to use only greedy ones?

Question

Is it possible to rewrite regular expressions containing nongreedy (reluctant) quantifiers to use only greedy ones?

Suppose I have a regular expression language that supports literals, positive and negative character classes, ordered alternation, and greedy quantifiers ? , * and + . (This is essentially a subset of PCRE without backlinks, a statement about the look, or some other fancier bits.) Does the addition of quants increase nongreedy ?? , *? and +? expressiveness of this formalism?

In other words, if template S contains an indefinite quantifier, can this template be rewritten to an equivalent template T that does not contain uncertainty quanta?

If this issue is considered in the literature, I will be grateful for any links that anyone can provide. I could hardly get any theoretical work on the expressive ability of extended regular expression formalisms (besides the usual things about how backlinks move you from ordinary languages to context-free grammars).

+8

regex theory pcre non-greedy

uckelman Jul 20 '11 at 18:15

source share

1 answer

Eamon nerbonne · Answer 1 · 2011-07-20T19:28:16+0000

When you say "Regex", you refer to several methods - this is not just a problem of the basic theory. Consider the question: "Does this string match my given regular expression?" For such a question, the concept of “greedy” is just a detail of the implementation - if you use one of the common (but inefficient) backtracking implementations, this can affect performance, but not output. Similarly, the question: "does this line contain a match?" it is not affected by greedy or non-greedy quantifiers. This first type of regular expression refers to the abstract concept of set-membership: the definition of the language of matching strings.

So why are there non-greedy quantifiers? Regular expressions are not just used for matching; general implementations can find where the match is and which parts of the regular expression match those parts of the output. By doing this, the user depends on the intricacies of the implementation, which is not trivial. This second type of regular expression is associated with getting a few bits of text into a more practical presentation with a language context that is not relevant to learning.

Typically, when you talk about the power of regular expression formalism, you are talking about the first world in which a computer answers simple yes or no. It’s easy to say because the specification is clear. When you talk about greedy and non-greedy quantifiers, you talk about the second world - used lots in practice, but with a specification that basically grew without much planning to solve real problems and is the standard due to backward compatibility, This second world solves completely different problems, and it’s not clear to me what even “expressive power” means here. Of course, non-living can be practical; and this question ...

Unwanted quantifiers do nothing for the first type of expressiveness, and they do for the second, although it is not entirely clear what “expressive power” means.

Is it possible to rewrite regular expressions containing nongreedy (reluctant) quantifiers to use only greedy ones?

More articles: