Jsp utf encoding

It’s hard for me to figure out how to deal with this problem:

I am developing a web tool for an Italian university, and I have to show words with accents (for example, è, ù, ...); sometimes I get these words from a PostgreSql table (UTF8 encoding), but basically I have to read long passages from a file. These files are encoded as utf-8 xml and display perfectly in Smultron or any utf-8 editor (they were created to parse in old python files with objects such as è instead of "è").

I wrote a java class that extracts the appropriate segments from an xml file that works as follows:

String s = parseText(filename, position)

if I write the returned String to a file, everything looks fine; the problem is that if I do

out.write(s)

on the jsp page, I get weird characters. By the way, I use

String s = getWordFromPostgresql(...)

out.write(s)

in the same jsp, and it displays OK.

Any clues?

Thanks Nicola


@ krosenvold

Thank you for your answer, however, this directive is already on the page, but it does not work (in fact, it "works", but only for the rows that I get from the database). I think there is something about reading from files, but I cannot understand ... they work in "java" but not in "jsp" (they cannot think of a better explanation ...)

here is a basic example extracted from real code: the method of reading from files returns a Map from Mark (an object representing the position in the text) to String (containing the text):

this is on the .jsp page (with the utf directive mentioned in the posts above)

  // ... Map<Mark, String> map = TestoMarkParser.parseMarks(...); out.write(map.get(m)); 

and this is the result:

"Fu per√ ≤ cos√¨ in uso il Genere Enharmonico, che quelli quali vi si esercitavano",

if I put the same code in a java class and replaced out.write with System.out.println, the result would be the following:

"Fu però così in uso il Genere Enharmonico, che quelli quali vi si esercitavano",


I am doing some analysis with a hex editor, here it is:

source line: "fu però così"

ò in xml file: C3 B2

ò displayed out.write () in jsp file: E2 88 9A E2 89 A4

ò is written to the file via:

 FileWriter w = new FileWriter(new File("out.txt")); w.write(s); // s is the parsed string w.close(); 

C3 B2

printing the values ​​of each character as int

 0: 70 = F 1: 117 = u 2: 32 = 3: 112 = p 4: 101 = e 5: 114 = r 6: 8730 =   7: 8804 =   8: 32 = 9: 99 = c 10: 111 = o 11: 115 = s 12: 8730 =   13: 168 =   14: 10 = ` 
+7
java jsp encoding utf
source share
4 answers

In the jsp page directive, you should try setting your content type to utf-8, which will also set pageEncoding to utf-8.

 <%@page contentType="text/html;charset=UTF-8"%> 

UTF-8 is not the default content type in jsp, and all sorts of interesting problems arise from this. The problem is that the base stream is interpreted as the ISO-8859-1 stream by default. If you write a few Unicode bytes to this stream, they will be interpreted as ISO-8859-1. I believe that setting encoding to utf-8 is the best solution.

Edit : Also, the string variable in java should always be unicode. Therefore, you should always be able to say

 System.out.println(myString) 

and you will see the correct character set entering the console window of your web server (or just go into the debugger and check it). I suspect that you will see the wrong characters when you do this, which makes me think that you have an encoding problem when building the string.

+15
source share

I have several international jsp [which have “special” international (in relation to English) characters].

The insertion of this [and only this one, that is: there is no contentType directive (which made a duplicate contentType error)], at the top of which they were saved and displayed correctly:

 <%@page pageEncoding="UTF-8"%> 

This link [http://www.inter-locale.com/codeset1.jsp] helped me discover this.

+2
source share
 String s = parseText(filename, position) 

Where is this method defined? I assume this is your own method that opens the file and extracts a specific piece of data. Somewhere in this process, it is converted from bytes to characters, possibly using the default encoding for your JVM.

If the standard encoding of your JVM compatibility does not match the actual encoding in the file, you will receive incorrect characters in your string. In addition, if you are reading content encoded in multibyte form (for example, UTF-8), your position may indicate the middle of the multibyte encoding.

If the source files are in well-formed XML, it will be much better for you to use a real parser (for example, the built-in JDK) to parse them, since the parser will provide the correct translation of bytes to characters. Then use the XPath expression to retrieve the values.

If you have not used the XML parser in the past, here are two documents that I wrote in parsing and XPath .


Edit: one thing that you might find useful is to print the actual values ​​of the characters in the string using something like the following:

 public static void main(String[] argv) throws Exception { String s = "testing\u20ac"; for (int ii = 0 ; ii < s.length() ; ii++) { System.out.println(ii + ": " + (int)s.charAt(ii) + " = " + s.charAt(ii)); } } 

You can probably also print your default character set so that you know how any particular sequence of bytes translates to characters:

 public static void main(String[] argv) throws Exception { System.out.println(Charset.defaultCharset()); } 

And finally, you should examine the served page as raw bytes in order to see exactly what is being returned to the client.


Change No. 2: the symbol ò is the Unicode value 00F2, which will be encoded by UTF-8 as C3 B2. These two codes do not match the characters that you indicated in your earlier answer.

For more information on Unicode characters, see the code diagrams on Unicode.org.

0
source share

I also had the same problem, all this is "utf-8" and why I see senseless characters, and the problem was in jsp and it should be at the head of the page.

  <%request.setCharacterEncoding("utf-8");%> 

and everything will be alright.

0
source share

All Articles