Removing diacritics and platform issues

I have this method for removing diacritics from a string in Java:

String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+"); return pattern.matcher(nfdNormalizedString).replaceAll(""); 

I have some simple test cases. They pass when I run them from within my IDE, but fail when I try them from Maven. I call maven from the command line, and my environment encoding is UTF-8. I am using the latest version of Java 6 that Apple has provided.

I do not know what encoding is inside the IDE, but it uses the same Java. Any thought on what might cause this problem?

+4
source share
1 answer

I believe this is due to improper input encoding processing.

If the input lines are specified in the source, you need to make sure that the source encoding matches the encoding in the compiler configuration. Note that Maven requires a separate compiler coding configuration as a property named project.build.sourceEncoding in pom.xml :

 <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> ... </properties> 

As a quick check, you can also replace characters in string literals with your Unicode ( \uxxxx ) screens - if the problem is caused by the source encoding, it should disappear.

If you are reading the input date from a file, make sure that you correctly specify the file encoding in your code and that you are not using methods based on the default system encoding.

See also:

+1
source

All Articles