Why do these two different regular expressions return different results in Ruby based on the underscore position

I have the following :.

[11] pry(main)> "ab BN123-4.56".scan(/BN([0-9_\.-]+)/) => [["123-4.56"]] [12] pry(main)> "ab BN123-4.56".scan(/BN([0-9\.-_]+)/) => [["123"]] 

and not sure why the second with an underline at the end behaves differently than the first. How is this interpreted by the RegEx parser to make it different?

THX

+6
source share
3 answers

This is because you have a hyphen ( - ) placed in the middle of a character class without escaping.

In the [] character class, you can put a hyphen ( - ) as the first or last character. If you place the hyphen somewhere else, you need to escape it ( \- ) so that it can be matched.

 "ab BN123-4.56".scan(/BN([0-9_\.-]+)/) # => '123-4.56' "ab BN123-4.56".scan(/BN([0-9\.\-_]+)/) # => '123-4.56' 

Note You also do not need to avoid periods ( . ), So you can rewrite this as.

 "ab BN123-4.56".scan(/BN([0-9_.-]+)/) # => '123-4.56' 

Or even the following if you decide to place it in the middle of a character class.

 "ab BN123-4.56".scan(/BN([0-9.\-_]+)/) # => '123-4.56' 
+7
source

A hyphen is a mess, not an underscore.

- is a special character within a character class indicating a range. One way to avoid this is to put it at the beginning or end of the class: [...-] .

So, [_.-] checks the character, either _ , or . or - .

And [.-_] Check the character in the range from . to _ ".

Illustration

BN([0-9.\-_]+) does what you expect and selects 123-4.56 from ab BN123-4.56 .

+4
source

A hyphen inside square brackets [] indicates a range. To use a literal hyphen, execute it, as well as special characters with \

0
source

All Articles