Java Unicode Comparison

Question

Java Unicode Comparison

Possible duplicates:
Java Ignore accents when comparing strings
Accent ignored java string

Hello to all

I need to compare strings in java that can be similar to "Chloe" and "Chloé". I need them to be equal. Does anyone know what is best practice? Or is there some kind of third-party library?

Roman

+6

java string unicode

Roman Nov 29 '10 at 11:49

source share

3 answers

Tassos bassassos · Answer 1 · 2010-11-29T12:10:52+0000

Check out International Components for Unicode , it can do what you need.

Edit: here is a sample code to run (from Collator Javadoc):

// Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if (usCollator.compare("abc", "ABC") == 0) { System.out.println("Strings are equivalent"); }

Lukas Eder · Answer 2 · 2010-11-29T11:54:31+0000

We translate the string “Chloé” to “Chloe” with hard-coded comparisons between special characters and their equivalent ASCII character before the comparison. This works pretty well, but is clumsy and there are probably some special characters that we forgot.

Our solution looks something like this:

 public static String replaceAccents(String string) { String result = null; if (string != null) { result = string; result = result.replaceAll("[àáâãåä]", "a"); result = result.replaceAll("[ç]", "c"); result = result.replaceAll("[èéêë]", "e"); result = result.replaceAll("[ìíîï]", "i"); result = result.replaceAll("[ñ]", "n"); result = result.replaceAll("[òóôõö]", "o"); result = result.replaceAll("[ùúûü]", "u"); result = result.replaceAll("[ÿý]", "y"); result = result.replaceAll("[ÀÁÂÃÅÄ]", "A"); result = result.replaceAll("[Ç]", "C"); result = result.replaceAll("[ÈÉÊË]", "E"); result = result.replaceAll("[ÌÍÎÏ]", "I"); result = result.replaceAll("[Ñ]", "N"); result = result.replaceAll("[ÒÓÔÕÖ]", "O"); result = result.replaceAll("[ÙÚÛÜ]", "U"); result = result.replaceAll("[Ý]", "Y"); } return result; }

So I'm curious to get a good answer to this question!

Kevin · Answer 3 · 2010-11-29T12:11:07+0000

How about stripAccent from Apache Commons?

 Removes the accents from a string. NOTE: This is a JDK 1.6 method, it will fail on JDK 1.5. StringUtils.stripAccents(null) = null StringUtils.stripAccents("") = "" StringUtils.stripAccents("control") = "control" StringUtils.stripAccents("&ecute;clair") = "eclair" Parameters: input - String to be stripped Returns: String without accents on the text

they don’t mention Unicode encoding (and only give HTML example), you can try anyway

Java Unicode Comparison

More articles: