Mb_str_replace () ... slowly. any alternatives?

I want to make sure that some string replacement that I run is multi-byte security. I found several mb_str_replace functions around the network, but they are slow. I am talking about an increase of 20% after passing through it 500-900 bytes.

Any recommendations? I am thinking about using preg_replace as it is native and compiled so it can be faster. Any thoughts would be appreciated.

+6
php multibyte
source share
3 answers

As indicated there , str_replace is safe to use in utf-8 contexts if all parameters are utf-8 valid, because it will not be any ambiguous match between multibyte-encoded strings. If you check the correctness of the input, then you do not need to look for another function.

+10
source share

Since encoding is a real problem when there are inputs from the outside (utf8 or others), I prefer to use only multibyte functions. For str_replace I use this one which is fast enough.

 if (!function_exists('mb_str_replace')) { function mb_str_replace($search, $replace, $subject, &$count = 0) { if (!is_array($subject)) { $searches = is_array($search) ? array_values($search) : array($search); $replacements = is_array($replace) ? array_values($replace) : array($replace); $replacements = array_pad($replacements, count($searches), ''); foreach ($searches as $key => $search) { $parts = mb_split(preg_quote($search), $subject); $count += count($parts) - 1; $subject = implode($replacements[$key], $parts); } } else { foreach ($subject as $key => $value) { $subject[$key] = mb_str_replace($search, $replace, $value, $count); } } return $subject; } } 
+3
source share

Here is my implementation based on Alain answer :

 /** * Replace all occurrences of the search string with the replacement string. Multibyte safe. * * @param string|array $search The value being searched for, otherwise known as the needle. An array may be used to designate multiple needles. * @param string|array $replace The replacement value that replaces found search values. An array may be used to designate multiple replacements. * @param string|array $subject The string or array being searched and replaced on, otherwise known as the haystack. * If subject is an array, then the search and replace is performed with every entry of subject, and the return value is an array as well. * @param string $encoding The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used. * @param int $count If passed, this will be set to the number of replacements performed. * @return array|string */ public static function mbReplace($search, $replace, $subject, $encoding = 'auto', &$count=0) { if(!is_array($subject)) { $searches = is_array($search) ? array_values($search) : [$search]; $replacements = is_array($replace) ? array_values($replace) : [$replace]; $replacements = array_pad($replacements, count($searches), ''); foreach($searches as $key => $search) { $replace = $replacements[$key]; $search_len = mb_strlen($search, $encoding); $sb = []; while(($offset = mb_strpos($subject, $search, 0, $encoding)) !== false) { $sb[] = mb_substr($subject, 0, $offset, $encoding); $subject = mb_substr($subject, $offset + $search_len, null, $encoding); ++$count; } $sb[] = $subject; $subject = implode($replace, $sb); } } else { foreach($subject as $key => $value) { $subject[$key] = self::mbReplace($search, $replace, $value, $encoding, $count); } } return $subject; } 

It does not accept character encoding, although I assume you can set it through mb_regex_encoding .

My unit tests pass:

 function testMbReplace() { $this->assertSame('bbb',Str::mbReplace('a','b','aaa','auto',$count1)); $this->assertSame(3,$count1); $this->assertSame('ccc',Str::mbReplace(['a','b'],['b','c'],'aaa','auto',$count2)); $this->assertSame(6,$count2); $this->assertSame("\xbf\x5c\x27",Str::mbReplace("\x27","\x5c\x27","\xbf\x27",'iso-8859-1')); $this->assertSame("\xbf\x27",Str::mbReplace("\x27","\x5c\x27","\xbf\x27",'gbk')); } 
+2
source share

All Articles