How to emulate MySQLs utf8_general_ci collation in PHP string comparison

Basically, if two rows are evaluated as the same in my database, I would also like to be able to check this at the application level. For example, if someone enters “bjork” in the search field, I want PHP to be able to match this with the string “Björk” just like MySQL.

I assume that PHP does not have the direct equivalent of MySQL matching options, and that the easiest way would be to write a simple function that converts strings using strtolower () to make them uniformly lowercase and strstr () to replace multi-byte characters with the corresponding ASCII equivalents .

Is this the exact premise? Does anyone have a stupid array that can be used as the second strstr () parameter to match strings, as various MySQL comparisons do (in particular, for my current needs, utf8_general_ci)? Or, if that's not enough, where can I find documentation on how different mappings in MySQL handle different characters? (Somewhere I saw that in some comparisons ß is considered as S, and in others - as Ss, for example, but he did not draw each character rating.)

+5
source share
3 answers

Here is what I used, but I have yet to test it for full compatibility with MySQL.

function collation_conform($string,$collation='utf8_general_ci')
{

    if($collation === 'utf8_general_ci')
    {
        if(!is_string($string))
            return $string;

        $string = strtr($string, array(
            'Š'=>'S', 'š'=>'s', 'Ð'=>'D', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 
            'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 
            'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 
            'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 
            'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 
            'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',
            'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f'));

        return strtolower($string);
    }
    else die('Unsupported Collation (collation_conform() collation_helper.php)');
}
+3
source

Try using the following code.

$s1 = 'Björk';
$s2 = 'bjork';

var_dump(
    is_same_string($s1, $s2)
);

function is_same_string($str, $str2, $locale = 'en_US')
{
    $coll = collator_create($locale);
    collator_set_strength($coll, Collator::PRIMARY);  
    return 0 === collator_compare($coll, $str, $str2);
}
-1
source

All Articles