Please tell me a little long answer.
Let's start with your second question. %C3%96 means that bytes 0xC3 and 0x96 are transmitted. These two bytes encode the Ö character in utf-8 .
From this (and that your query yields the described results) I assume that you are using utf-8 all the way .
lexicographic character order; this encoding is determined by collation .
This is a more or less ordered list of characters. For instance. A, B, C, D, .... means A<B<C ....
But these lists contain several characters in the same "place", for example,
[A, Ä], B, C, D .... means that A==Ä->true
___ excursion not directly related to your question ____
Let's take a look at the "name" of the symbol Ö , it LATIN CAPITAL LETTER O WITH DIAERESIS .
So, the base character is O, it just has some jewelry.
Some systems / libraries allow you to specify "grain" / level / degree of comparison, see, for example, Collator :: setStrength of the php-intl extension.
<?php // utf8 characters define('SMALL_O_WITH_DIAERESIS', chr(0xC3) . chr(0xB6)); define('CAP_O_WITH_DIAERESIS', chr(0xC3) . chr(0x96)); $coll = collator_create( 'utf-8' ); foreach( array('PRIMARY', 'SECONDARY', 'TERTIARY') as $strength) { echo $strength, "\r\n"; $coll->setStrength( constant('Collator::'.$strength) ); echo ' o ~ ö = ', $coll->compare('o', SMALL_O_WITH_DIAERESIS), "\r\n"; echo ' Ö ~ ö = ', $coll->compare(CAP_O_WITH_DIAERESIS, SMALL_O_WITH_DIAERESIS), "\r\n"; }
prints
PRIMARY o ~ ö = 0 Ö ~ ö = 0 SECONDARY o ~ ö = -1 Ö ~ ö = 0 TERTIARY o ~ ö = -1 Ö ~ ö = 1
At the initial level, all involved characters (o, O, ö, ...) are just some irrelevant variations of the character O, so they are all considered equal. At the secondary level, the additional “feature” WITH DIAERESIS taken into account, and at the third level, is it also a small or large letter.
But ... MySQL doesn't work that way ... so sorry again ;-)
___ end of excursion ____
MySQL has sort tables that determine the order. When you select an encoding, you also mean the default choice for this encoding, unless you specify it explicitly. In your case, the implied sorting is probably utf8_general_ci , and it refers to ö == o.
This applies to both the definition of the table and the set / match of the connection (the latter is almost irrelevant in your situation).
Utf8_turkish_ci, on the other hand, considers ö! = O. This is probably a sort.
When you have a table layout like
CREATE TABLE soFoo ( x varchar(32) ) CHARACTER SET utf8
standard sorting is selected for utf8 → general_ci → = ö
You can specify a default sort for the table when defining it
CREATE TABLE soFoo ( x varchar(32) ) CHARACTER SET utf8 COLLATE utf8_turkish_ci
Since you already have a table plus data, you can change the sorting of the table ... but if you do it at the table level, you should use ALTER TABLE ... CONVERT (in case you use MODIFY, the column retains its "original" sorting).
ALTER TABLE soFoo CONVERT TO CHARACTER SET utf8 COLLATE utf8_turkish_ci
This pretty much takes care of your problem.
As a side note, there is (as mentioned) a mapping assigned to your connection . Choosing an encoding means choosing a sort. I mainly use PDO when (directly) connects to MySQL, and my default connection code is as follows
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array( PDO::ATTR_EMULATE_PREPARES=>false, PDO::MYSQL_ATTR_DIRECT_QUERY=>false, PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION ));
mark charset=utf8 ; no matching, so general_ci is assigned to the connection. And that's why
<?php $pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array( PDO::ATTR_EMULATE_PREPARES=>false, PDO::MYSQL_ATTR_DIRECT_QUERY=>false, PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION )); $smallodiaresis_utf8 = chr(0xC3) . chr(0xB6); foreach( $pdo->query("SELECT 'o'='$smallodiaresis_utf8'") as $row ) { echo $row[0]; }
prints 1 Value o == ö. The string literals used in the instruction are treated as utf8 / utf8_general_ci.
I could either specify the sort for the string literal explicitly in the expression
SELECT 'o' COLLATE utf8_turkish_ci ='ö'
(only setting it for one of two literals / operands, why and how it works, see Collation of Expressions )
or I can establish connection mapping through
$pdo->exec("SET collation_connection='utf8_turkish_ci'");
both results are in
foreach( $pdo->query("SELECT 'o'[...]='$smallodiaresis_utf8'") as $row ) { echo $row[0]; }
print 0 .
change: and complicate things a bit more:
The utf8 encoding cannot represent all possible characters. There's an even wider utf8mb4 character utf8mb4 .