You may need to tell Perl that your source file contains utf8 characters. Try:
#!/usr/bin/perl use utf8; # **** Add this line $str = 'mısır'; $str =~ m/m[ıi]s[ıi]r/ && print "match double undotted ı\n";
Which does not help you with PHP, but there may be a similar directive in PHP. Otherwise, try using some form of escape sequence to avoid placing a literal character in the source code. I don't know anything about PHP, so I can't help it.
Edit
I read that PHP does not support Unicode. Thus, the Unicode input that you pass in is most likely treated as a string of bytes that Unicode encoded as.
If you can be sure that your input comes in as utf-8, then you can match for the utf-8 sequence for ı , which is \xc4 \xb1 , as in:
$str = 'mısır';
It works?
Change again:
I can explain why the first three tests pass. Suppose in your encoding ı encoded as ABCDE . then PHP sees the following:
echo 'match single normal i: '; $str = 'mi'; echo (preg_match('!m[ABCDEi]!', $str)) ? "ok\n" : "fail\n"; echo 'match single undotted ABCDE: '; $str = 'mABCDE'; echo (preg_match('!m[ABCDEi]!', $str)) ? "ok\n" : "fail\n"; echo 'match double normal i: '; $str = 'misir'; echo (preg_match('!m[ABCDEi]s[ABCDEi]r!', $str)) ? "ok\n" : "fail\n"; echo 'match double undotted ABCDE: '; $str = 'mABCDEsABCDEr'; echo (preg_match('!m[ABCDEi]s[ABCDEi]r!', $str)) ? "ok\n" : "fail\n";
which makes it obvious why the first three tests pass, and the last one fails. If you use the start / end anchor ^...$ , I think you will find that only the first test passes.
source share