Java Unicode Comparison

Possible duplicates:
Java Ignore accents when comparing strings
Accent ignored java string

Hello to all

I need to compare strings in java that can be similar to "Chloe" and "Chloé". I need them to be equal. Does anyone know what is best practice? Or is there some kind of third-party library?

Roman

+6
java string unicode
source share
3 answers

Check out International Components for Unicode , it can do what you need.

Edit: here is a sample code to run (from Collator Javadoc):

// Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if (usCollator.compare("abc", "ABC") == 0) { System.out.println("Strings are equivalent"); } 
+9
source share

We translate the string “Chloé” to “Chloe” with hard-coded comparisons between special characters and their equivalent ASCII character before the comparison. This works pretty well, but is clumsy and there are probably some special characters that we forgot.

Our solution looks something like this:

 public static String replaceAccents(String string) { String result = null; if (string != null) { result = string; result = result.replaceAll("[àáâãåä]", "a"); result = result.replaceAll("[ç]", "c"); result = result.replaceAll("[èéêë]", "e"); result = result.replaceAll("[ìíîï]", "i"); result = result.replaceAll("[ñ]", "n"); result = result.replaceAll("[òóôõö]", "o"); result = result.replaceAll("[ùúûü]", "u"); result = result.replaceAll("[ÿý]", "y"); result = result.replaceAll("[ÀÁÂÃÅÄ]", "A"); result = result.replaceAll("[Ç]", "C"); result = result.replaceAll("[ÈÉÊË]", "E"); result = result.replaceAll("[ÌÍÎÏ]", "I"); result = result.replaceAll("[Ñ]", "N"); result = result.replaceAll("[ÒÓÔÕÖ]", "O"); result = result.replaceAll("[ÙÚÛÜ]", "U"); result = result.replaceAll("[Ý]", "Y"); } return result; } 

So I'm curious to get a good answer to this question!

+3
source share

How about stripAccent from Apache Commons?

 Removes the accents from a string. NOTE: This is a JDK 1.6 method, it will fail on JDK 1.5. StringUtils.stripAccents(null) = null StringUtils.stripAccents("") = "" StringUtils.stripAccents("control") = "control" StringUtils.stripAccents("&ecute;clair") = "eclair" Parameters: input - String to be stripped Returns: String without accents on the text 

they don’t mention Unicode encoding (and only give HTML example), you can try anyway

0
source share

All Articles