How to specify Regexp for Cyrillic Unicode characters in Ruby 1.9
#coding: utf-8 str2 = "asdf" p str2.encoding #<Encoding:UTF-8> p str2.scan /\p{Cyrillic}/ #found all cyrillic characters str2.gsub!(/\w/u,'') #removes only latin characters puts str2 The question is why \w ignore Cyrillic characters?
I installed the latest ruby ββpackage from http://rubyinstaller.org/ . Here is my conclusion ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-mingw32] As far as I know 1.9 oniguruma regex library has full Unicode character support.
This is stated in the Ruby documentation : \w equivalent to [a-zA-Z0-9_] and thus does not target any Unicode character.
You probably want to use [[:alnum:]] , which includes all characters in Unicode alphanumeric characters. Check also [[:word:]] and [[:alpha:]] .