PHP mysql encoding and character comparison in Turkish

I am trying to filter Turkish names from MySql database through AJAX POST, the words of the English letter list everything well, however, if I send Ö (which is the letter O with dots), the results come for both O and ... not only Ö

I also noticed that the AJAX message is being sent Ö to% C3% 96, can anyone help?

+3
source share
2 answers

Please tell me a little long answer.
Let's start with your second question. %C3%96 means that bytes 0xC3 and 0x96 are transmitted. These two bytes encode the Ö character in utf-8 .
From this (and that your query yields the described results) I assume that you are using utf-8 all the way .

lexicographic character order; this encoding is determined by collation .
This is a more or less ordered list of characters. For instance. A, B, C, D, .... means A<B<C ....
But these lists contain several characters in the same "place", for example,
[A, Ä], B, C, D .... means that A==Ä->true

___ excursion not directly related to your question ____
Let's take a look at the "name" of the symbol Ö , it LATIN CAPITAL LETTER O WITH DIAERESIS .
So, the base character is O, it just has some jewelry.
Some systems / libraries allow you to specify "grain" / level / degree of comparison, see, for example, Collator :: setStrength of the php-intl extension.

 <?php // utf8 characters define('SMALL_O_WITH_DIAERESIS', chr(0xC3) . chr(0xB6)); define('CAP_O_WITH_DIAERESIS', chr(0xC3) . chr(0x96)); $coll = collator_create( 'utf-8' ); foreach( array('PRIMARY', 'SECONDARY', 'TERTIARY') as $strength) { echo $strength, "\r\n"; $coll->setStrength( constant('Collator::'.$strength) ); echo ' o ~ ö = ', $coll->compare('o', SMALL_O_WITH_DIAERESIS), "\r\n"; echo ' Ö ~ ö = ', $coll->compare(CAP_O_WITH_DIAERESIS, SMALL_O_WITH_DIAERESIS), "\r\n"; } 

prints

 PRIMARY o ~ ö = 0 Ö ~ ö = 0 SECONDARY o ~ ö = -1 Ö ~ ö = 0 TERTIARY o ~ ö = -1 Ö ~ ö = 1 

At the initial level, all involved characters (o, O, ö, ...) are just some irrelevant variations of the character O, so they are all considered equal. At the secondary level, the additional “feature” WITH DIAERESIS taken into account, and at the third level, is it also a small or large letter.
But ... MySQL doesn't work that way ... so sorry again ;-)
___ end of excursion ____

MySQL has sort tables that determine the order. When you select an encoding, you also mean the default choice for this encoding, unless you specify it explicitly. In your case, the implied sorting is probably utf8_general_ci , and it refers to ö == o.
This applies to both the definition of the table and the set / match of the connection (the latter is almost irrelevant in your situation).
Utf8_turkish_ci, on the other hand, considers ö! = O. This is probably a sort.

When you have a table layout like

 CREATE TABLE soFoo ( x varchar(32) ) CHARACTER SET utf8 

standard sorting is selected for utf8 → general_ci → = ö
You can specify a default sort for the table when defining it

 CREATE TABLE soFoo ( x varchar(32) ) CHARACTER SET utf8 COLLATE utf8_turkish_ci 

Since you already have a table plus data, you can change the sorting of the table ... but if you do it at the table level, you should use ALTER TABLE ... CONVERT (in case you use MODIFY, the column retains its "original" sorting).

 ALTER TABLE soFoo CONVERT TO CHARACTER SET utf8 COLLATE utf8_turkish_ci 

This pretty much takes care of your problem.


As a side note, there is (as mentioned) a mapping assigned to your connection . Choosing an encoding means choosing a sort. I mainly use PDO when (directly) connects to MySQL, and my default connection code is as follows

 $pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array( PDO::ATTR_EMULATE_PREPARES=>false, PDO::MYSQL_ATTR_DIRECT_QUERY=>false, PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION )); 

mark charset=utf8 ; no matching, so general_ci is assigned to the connection. And that's why

 <?php $pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array( PDO::ATTR_EMULATE_PREPARES=>false, PDO::MYSQL_ATTR_DIRECT_QUERY=>false, PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION )); $smallodiaresis_utf8 = chr(0xC3) . chr(0xB6); foreach( $pdo->query("SELECT 'o'='$smallodiaresis_utf8'") as $row ) { echo $row[0]; } 

prints 1 Value o == ö. The string literals used in the instruction are treated as utf8 / utf8_general_ci.

I could either specify the sort for the string literal explicitly in the expression

 SELECT 'o' COLLATE utf8_turkish_ci ='ö' 

(only setting it for one of two literals / operands, why and how it works, see Collation of Expressions )
or I can establish connection mapping through

 $pdo->exec("SET collation_connection='utf8_turkish_ci'"); 

both results are in

 foreach( $pdo->query("SELECT 'o'[...]='$smallodiaresis_utf8'") as $row ) { echo $row[0]; } 

print 0 .

change: and complicate things a bit more:
The utf8 encoding cannot represent all possible characters. There's an even wider utf8mb4 character utf8mb4 .

+2
source

PHP code should get %C3%96 properly decoded back to Ö . But if not, then apply the urldecode() PHP function to the string.

You will still have the symbol Ö , not O ; this is normal?

If you get Ö , then there is a mixture of utf8 and latin1. This is another problem.

0
source

All Articles