What is the difference between / \ p {Alpha} / i and / \ p {L} / i in ruby?

Question

What is the difference between / \ p {Alpha} / i and / \ p {L} / i in ruby?

I am trying to create a regular expression in ruby to match alpha characters in UTF-8, like ñíóúü , etc. I know that /\p{Alpha}/i works and /\p{L}/i works too, but what's the difference?

+6

ruby regex

Bishma stornelli Nov 22 '12 at 16:51

source share

1 answer

Martin ender · Accepted Answer · 2012-11-22T17:10:13+0000

They seem equivalent. ( Edit: see end of this answer)

Ruby seems to support \p{Alpha} since version 1.9. On POSIX, \p{Alpha} is equal to \p{L&} (for regular expressions with Unicode support; see here ). This corresponds to all characters having a variant with upper and lower case ( see here ). Unicase letters will not match (while they match \p{L} .

This is not like Ruby (I chose a random Arabic character, since the Arabic alphabet is Unicase):

\p{L} (any letter) matches.
The case-sensitive classes \p{Lu} , \p{Ll} , \p{Lt} do not match. As expected.
p{L&} does not match. As expected.
\p{Alpha} matches.

Which seems like a very good indication that \p{Alpha} is just an alias for \p{L} in Ruby. In Rubular, you can also see that \p{Alpha} not available in Ruby 1.8.7.

Note that the modifier i does not matter in any case, because both \p{Alpha} and \p{L} in any case correspond to both upper and lower case characters.

EDIT:

Ah, there is a difference! I just found this PDF about the new Ruby regex engine (used with Ruby 1.9 as mentioned above). \p{Alpha} is available regardless of the encoding (and probably will just match [A-Za-z] if there is no Unicode support), and \p{L} is a Unicode property. This means that \p{Alpha} behaves exactly the same as in POSIX regular expressions, with the difference that it matches \p{L} , but in POSIX it matches \p{L&} .

What is the difference between / \ p {Alpha} / i and / \ p {L} / i in ruby?

More articles: