Normalize mysql arabic text

I am having trouble finding Arabic text in mysql. I have a row in the database containing the entry

display_name أحمد 

But when I try to execute a query with

 SELECT * FROM wp_users WHERE display_name LIKE '%احمد%' 

I tried adding at the end of the request

 collate utf8_bin 

But that didn't work either. How can I

 احمد == أحمد 
+7
php mysql wordpress diacritics arabic
source share
1 answer

I do not have an exact solution, but I can say why it does not work. If you want these two lines to be considered equal, you need to use a different sort, since utf8_bin compares the exact code points, and the two lines are clearly not identical if viewed in this way. Normally MySQL utf8_general_ci collation provided transliteration and normalization, for example, all this corresponded:

 SELECT 'a'='A' COLLATE utf8_general_ci; SELECT 'ü'='u' COLLATE utf8_general_ci; SELECT 'ß'='ss' COLLATE utf8_general_ci; 

but in your case this does not work, as well as a more accurate utf8_unicode_ci sort:

 SELECT 'احمد'='أحمد' COLLATE utf8_general_ci; SELECT 'احمد'='أحمد' COLLATE utf8_unicode_ci; 

This diagram shows the character mapping for the Middle East languages ​​in MySQL utf8_unicode_ci , and you can see that the characters أ and ا are not considered equal, so MySQL matching by default will not solve this problem.

To get around this, you have two options: normalize your lines before they get into MySQL (i.e. PHP), or extend MySQL to provide appropriate sorting to do what you need.

The Ar-PHP project can help with the first, as suggested by sєsє. You must keep your real username and normalized separately so that you can search on one and display the other. Another project also allows you to rephrase Arabic strings to work better in MySQL.

MySQL docs shows how to create custom collation . In fact, this is due to editing the XML LDML file (there is at least a BBEdit plugin for this) and providing it to MySQL. This will allow you to create a mapping that allows you to consider some characters as equivalents. The advantage of this approach is that it is transparent to PHP and you do not need additional columns in your database. If you build such a mapping, it would be useful to other Arab users in several programming languages, not just PHP.

+8
source share

All Articles