Regular expression fix for working with ICU / RegexKitLite error

I use RegexKitLite, which in turn uses ICU as its engine. Despite the documentation, a regular expression like / x * / when searching on "xxxxxxxxxxx" will match an empty string. He behaves like / x *? / Must. I would like to route this error when it is present, and I am considering overwriting any unescaped * as + when a regular expression match returns a result with a length of 0 times. My naive guess is that the regular expression with + s in placeof * s always returns a subset of the correct results. What are the unexpected consequences of this? Am I going right?

FWIW, ICU also offers the * + operator, but it also does not work.

EDIT: I should have been more clear: this is for the search area of ​​an interactive application. I cannot control the regular expression that the user types. Broken * support seems to be a bug in the ICU. I am sure that I did not need to include this POS in my code, but this is the only game in the city.

+2
source share
4 answers

If you simply change each quantifier *to +, the regular expression will not work in cases where it * should coincide with zero occurrences. In other words, the problem will have morphed from always matching zero, so as never to match zero. If you ask me, it is useless anyway.

, , . , x* (?:(?!x)|x+). , , , . (*+), (*?).

:

BEFORE       AFTER
x*           (?:(?!x)|x+)
x*+          (?:(?!x)|x++)
x*?          x*?
:
(?:xyz)*     (?:(?!(?:xyz))|(?:xyz)+)
, , lookahead, , , .: D {min,} {min,max}, ( ):
x{0,}        same as x*
x{0,n}       (?:(?!x)|x{1,n})

, - (?(condition)yes-pattern|no-pattern) - ; , ICU, , .

+1

, - , , , ICU. ( ICU.)

, , , , , . , , .

+1

\* [*] , .

, , . .

x* x{0,} (?:x+)?.

0

, :
()

if ($ str = ~/x */& & $str = ~/(x +)/) {  print "'$ 1'\n"; }

But the real problem is BUG, ​​as you say. Why is the basic design of quantifiers actually twisted? This is not a module that you should include in your code.

0
source

All Articles