How can I match a Russian word using preg_replace in PHP?

Question

How can I match a Russian word using preg_replace in PHP?

How can I find a match for a Russian word in a string (also in Russian) in PHP?

So, for example, something like this:

$pattern = '//'; preg_replace($pattern, $replacement, $string_in_russian)

I tried utf8_encode and htmlentities with the UTF-8 flag for $ pattern, but that didn't work. Should I also encode $ string_in_russian?

Update: The suggestion for the / u flag does not work, so I put the actual code in it. This is from the glossary plugin for Wordpress (my site is correctly configured to use the Russian language, and it works, but not in this case). So here is the code

 $glossary_title = $glossary_item->post_title; $glossary_search = '/\b'.$glossary_title.'s*?\b(?=([^"]\*"[^"]\*")\*[^"]*$)/iu'; $glossary_replace = '&lt;a'.$timestamp.'&gt;$0&lt;/a'.$timestamp.'&gt;'; $content_temp = preg_replace($glossary_search, $glossary_replace, $content, 1);

When I do a quick echo in an HTML comment, this is the type of string I get for the template
/\bs*?\b(?=([^"]*"[^"]")[^"]*$)/iu

And well, that still doesn't work. I thought that maybe it was the “s” that wrapped me up (this level of regex is a little higher than me, but I guess it exists for possible plurals), but deleting it did not help.

Update # 2: Okay, so I decided to do a full “clean slide” test - a simple PHP file with some lines of content in English and Russian and target words to replace. Here is the code

 $content_en = 'Nulla volutpat pretium nunc, ac feugiat neque lobortis vitae. In eu sapien sit amet eros tincidunt viverra. <b style="color:purple">Proin</b> congue hendrerit felis, et consequat neque ultrices lobortis. <b style="color:purple">Proin</b> luctus bibendum libero et molestie. Sed tristique lacus a urna semper eget feugiat lacus varius. Donec vel sodales diam. <b style="color:purple">Proin</b> fringilla laoreet purus, a facilisis nisi porttitor vel. Nullam ac justo ac elit laoreet ullamcorper vel a magna. Suspendisse in arcu sapien.'; $find_en = 'proin'; $replace_with_en = '<em style="color:red">REPLACEMENT</em>'; $glossary_search = '/\b'.$find_en.'s*?\b(?=([^"]*"[^"]*")*[^"]*$)/iu'; $content_en_replaced = preg_replace($glossary_search, $replace_with_en, $content_en); $content_ru = 'Lorem Ipsum  ,         ,         ,       " <b style="color:purple"></b> ..  <b style="color:purple"></b> ..  <b style="color:purple"></b> .."       HTML  Lorem Ipsum     .'; $find_ru = ''; $replace_with_ru = '<em style="color:red"></em>'; $glossary_search = '/\b'.$find_ru.'s*?\b(?=([^"]*"[^"]*")*[^"]*$)/iu'; $content_ru_replaced = preg_replace($glossary_search, $replace_with_ru, $content_ru);

And here is a screenshot of the release http://www.flickr.com/photos/iliadraznin/5372578707/

As you can see, the English text replaced the target word, while the Russian language does not, and the code is identical, and I use the / u flag. The file is also encoded by UTF-8. Any suggestions? (and again I tried to remove the "s", still nothing)

+7

php regex preg-replace internationalization utf-8

ilia Jan 19 '11 at 19:43

source share

3 answers

First you need to make sure your php file is encoded using UTF-8. Even if you do not have UTF-8 characters in the file (they can be transferred from another file), the file must be UTF-8 for the functions inside it to work with UTF-8.

+1

Joel Jan 20 '11 at 3:36

source share

The "u" in PCRE regexp provides Unicode, therefore:

 <?php $str = '   '; if(preg_match("''isu", $str, $match)) { echo $match; } ?>

Also an example for preg_replace:

 <?php $str = '   '; echo preg_replace("''isu", '', $str); ?>

0

Mark pegasov Feb 25 '11 at 14:53

source share

tomwalsham · Accepted Answer · 2011-01-20T17:34:16+0000

If you do a real clean slate test, you will find that there is nothing wrong with the Russian - in fact this is the boundary aspect of the word that violates the regular expression.

 $glossary_search = '/'.$find_ru.'/iu'; // Works fine $glossary_search = '/\b'.$find_ru.'\b/iu'; // Breaks

Word border reduction does not match UTF-8, so according to this question: matching php regex dictionary boundary in utf-8 you can try the following:

 $glossary_search = '/(?<!\pL)'.$find_ru.'(?!\pL)/iu';

This works great on my testing here.

How can I match a Russian word using preg_replace in PHP?

More articles: