German umlauts and UTF8, revised

Question

German umlauts and UTF8, revised

As I am sure, many people here know that dealing with German umlauts and UTF8 comparisons can be problematic, to say the least. Things like a = ä, o = ö, u = ü, not only affect the sort order of the results, but also on the actual results. Here is an example that clearly demonstrates how things can go wrong just trying to distinguish between the singular and plural versions of the noun ( Bademantel- singular, Bademäntel- plural).

CREATE TABLE keywords (
    id INT (11) PRIMARY KEY AUTO_INCREMENT,
    keyword VARCHAR (255) NOT NULL
) ENGINE = MyISAM DEFAULT CHARACTER
SET = utf8 COLLATE = utf8_unicode_ci;

INSERT INTO keywords (keyword) VALUES ('Bademantel'), ('Bademäntel');

SELECT * FROM keywords WHERE keyword LIKE ('%Bademäntel%');

Results should be

+----+------------+
| id | keyword    |
+----+------------+
|  1 | Bademäntel |
+----+------------+

but with the utf8_unicode_ciconclusion

+----+------------+
| id | keyword    |
+----+------------+
|  1 | Bademantel |
|  2 | Bademäntel |
+----+------------+

which is clearly not the desired result.

. , - . , ,

SELECT keyword FROM keywords GROUP BY keyword ORDER BY LENGTH(keyword) DESC

SELECT DISTINCT keyword FROM keywords ORDER BY LENGTH(keyword) DESC

() , umlaut, , (.. , Bademäntel, Bademantel ).

, .

1) utf8_swedish_ci , .

SELECT DISTINCT keyword COLLATE utf8_swedish_ci AS keyword FROM keywords ORDER BY LENGTH(keyword) DESC;

, utf8_unicode_ci, a) "Eszett" (ss ß ), b) - .

2) , utf8_bin.

SELECT DISTINCT keyword COLLATE utf8_bin AS keyword FROM keywords ORDER BY LENGTH(keyword) DESC;

, , , , , utf8_bin , , LIKE('%Mäntel%'), , Bademäntel.

, SO, , , - , . , , . , ?

.

+4

mysql diacritics collation

brezanac 06 . '14 15:43

2

, WHERE BINARY keyword = 'Bademantel'. .

sqlfiddle, :

SELECT * FROM stackoverflow WHERE BINARY keyword = 'Bademantel';

| id |    keyword |
|----|------------|
|  1 | Bademantel |

SELECT * FROM stackoverflow WHERE keyword = 'Bademantel';

| id |    keyword |
|----|------------|
|  1 | Bademantel |
|  2 | Bademäntel |

: ? : MySQL

, / , .

utf8_general_ci, utf8_bin , Bademantel.

, utf8_general_ci . Straße - Strasse, Straße.

0

CodeBrauer 25 . '16 14:16

brezanac · Accepted Answer · 2015-11-16T13:05:37+0000

, , , MySQL 5.6 utf8_german2_ci, . , .

German umlauts and UTF8, revised

More articles: