p str2.scan /\p{Cyr...">

How to specify Regexp for Cyrillic Unicode characters in Ruby 1.9

#coding: utf-8 str2 = "asdf" p str2.encoding #<Encoding:UTF-8> p str2.scan /\p{Cyrillic}/ #found all cyrillic characters str2.gsub!(/\w/u,'') #removes only latin characters puts str2 

The question is why \w ignore Cyrillic characters?

I installed the latest ruby ​​package from http://rubyinstaller.org/ . Here is my conclusion ruby -v

 ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32] 

As far as I know 1.9 oniguruma regex library has full Unicode character support.

+7
ruby regex encoding unicode character-properties
source share
1 answer

This is stated in the Ruby documentation : \w equivalent to [a-zA-Z0-9_] and thus does not target any Unicode character.

You probably want to use [[:alnum:]] , which includes all characters in Unicode alphanumeric characters. Check also [[:word:]] and [[:alpha:]] .

+10
source share

All Articles