Why does java.lang.StringEncoding.encode ignore this encoding for use by default?

Question

Why does java.lang.StringEncoding.encode ignore this encoding for use by default?

The default encoding for the application is set to "UTF-8" (using -Dfile.encoding = UTF-8 at startup). When I use the String class method "getBytes (String charsetName)" with charset = "ISO-8859-1", it looks like StringCoding.encode finally uses the default encoding (UTF-8) instead of this one (ISO-8859 -1).

For some unknown reason, I can debug step by step using this method, but I cannot check the value of the internal elements (only parameters called arg0, arg1 ...)

In java 1.6.10 StringCoding.encode is written:

static byte[] encode(String charsetName, char[] ca, int off, int len) throws UnsupportedEncodingException { StringEncoder se = (StringEncoder)deref(encoder); String csn = (charsetName == null) ? "ISO-8859-1" : charsetName; if ((se == null) || !(csn.equals(se.requestedCharsetName()) || csn.equals(se.charsetName()))) { se = null; try { Charset cs = lookupCharset(csn); if (cs != null) se = new StringEncoder(cs, csn); } catch (IllegalCharsetNameException x) {} if (se == null) throw new UnsupportedEncodingException (csn); set(encoder, se); } return se.encode(ca, off, len); }

With step-by-step debugging, I never enter the if block, and then a new StringEncoder with my ISO-8859-1 encoding is not created. Finally, the Charset.defaultCharset () method is called.

Any clues? thanks

+4

java character-encoding

Redmat Jun 20 '11 at 10:22

source share

3 answers

Jon skeet · Answer 1 · 2011-06-20T10:32:13+0000

If you do not fall into the if block, this expression must be false

 (se == null) || !(csn.equals(se.requestedCharsetName()) || csn.equals(se.charsetName()))

It means that:

se must not be null
The second part must be true before clicking ! , so one of these subexpressions must be true:
- csn.equals(se.requestedCharsetName())
- csn.equals(se.charsetName())

In other words, se already suitable for the requested encoding name.

It does not use the default encoding for the virtual machine; it uses the last encoder used in this stream.

I highly doubt that you have found a JRE bug - this looks good to me. So what made you debug this for a start? Can you provide a short but complete program that demonstrates an error using this? Is something encoding invalid bytes?

stark · Answer 2 · 2011-06-20T10:29:11+0000

You need to change

-Dfile.ecoding=UTF-8 before

-Dfile.encoding=UTF-8

A.Grandt · Answer 3 · 2013-12-16T13:17:57+0000

encode should never ignore this encoding in order to return to the one specified in -Dfile.encoding.

This is true. As well as decoding, even if the source shows that it finds the encoding and sets it in the line:

 set(encoder, se);

Neither encoding nor decoding is thread safe, therefore the value is tied to the system override by default, it is possible that the set value can be used before or after decoding.

IMHO, this is a bug in the JRE. True, the OP had a typo, but this does not change the fact that if you ask String to decode an array of bytes in UTF-8, it should always return UTF-8, and not be silent back on something else.

Why does java.lang.StringEncoding.encode ignore this encoding for use by default?

More articles: