Grapheme functions handle the UTF-8 string more correctly than mbstring, and the PCRE / Mbstring and PCRE functions can interrupt characters. You can see the difference between them by running the following code.
function str_to_array($string) { $length = grapheme_strlen($string); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = grapheme_substr($string, $i, 1); } return $ret; } function str_to_array2($string) { $length = mb_strlen($string, "UTF-8"); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = mb_substr($string, $i, 1, "UTF-8"); } return $ret; } function str_to_array3($string) { return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY); } function utf8_strrev($string) { return implode(array_reverse(str_to_array($string))); } function utf8_strrev2($string) { return implode(array_reverse(str_to_array2($string))); } function utf8_strrev3($string) { return implode(array_reverse(str_to_array3($string))); } // http://www.php.net/manual/en/function.grapheme-strlen.php $string = "a\xCC\x8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6) var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); }, [ 'should be' => "o\xCC\x88"."a\xCC\x8A", 'grapheme' => utf8_strrev($string), 'mbstring' => utf8_strrev2($string), 'pcre' => utf8_strrev3($string) ]));
The result is here.
array(4) { ["should be"]=> string(12) "6FCC8861CC8A" ["grapheme"]=> string(12) "6FCC8861CC8A" ["mbstring"]=> string(12) "CC886FCC8A61" ["pcre"]=> string(12) "CC886FCC8A61" }
IntlBreakIterator can be used with PHP 5.5 (intl 3.0);
function utf8_strrev($str) { $it = IntlBreakIterator::createCodePointInstance(); $it->setText($str); $ret = ''; $pos = 0; $prev = 0; foreach ($it as $pos) { $ret = substr($str, $prev, $pos - $prev) . $ret; $prev = $pos; } return $ret; }
masakielastic
source share