Convert Latin-1 content in InputStream to UTF-8 string

I need to convert the contents of InputStream to String. The difficulty here is the encoding of the input, namely Latin-1. I tried several approaches and code snippets with String, getBytes, char [], etc., to get the encoding directly, but nothing worked.

Finally, I came up with a working solution below. However, this code seems a bit detailed to me, even for Java. So the question is here:

Is there a simpler and more elegant approach to achieving what is being done here?

private String convertStreamToStringLatin1(java.io.InputStream is) throws IOException { String text = ""; // setup readers with Latin-1 (ISO 8859-1) encoding BufferedReader i = new BufferedReader(new InputStreamReader(is, "8859_1")); int numBytes; CharBuffer buf = CharBuffer.allocate(512); while ((numBytes = i.read(buf)) != -1) { text += String.copyValueOf(buf.array(), 0, numBytes); buf.clear(); } return text; } 
+7
source share
5 answers

First, a few criticisms regarding the approach you have already taken. You should not unnecessarily use NIO CharBuffer when you want just char[512] . You also do not need to clear buffer each iteration.

 int numBytes; final char[] buf = new char[512]; while ((numBytes = i.read(buf)) != -1) { text += String.copyValueOf(buf, 0, numBytes); } 

You should also know that only building a String with these arguments will have the same effect as the constructor also copies the data.

The contents of the subarray are copied; subsequent modification of the character array does not affect the newly created string.


You can use the dynamic ByteArrayOutputStream , which increases the internal buffer to accommodate all the data. Then you can use the whole byte[] from toByteArray to decode to String .

The advantage is that decoding decoding to the end allows you to avoid fragments of decoding individually; while this may work for simple encodings such as ASCII or ISO-8859-1, it will not work on multibyte schemes such as UTF-8 and UTF-16. This means that in the future it is easier to change the character encoding, since the code does not require changes.

 private static final String DEFAULT_ENCODING = "ISO-8859-1"; public static final String convert(final InputStream in) throws IOException { return convert(in, DEFAULT_ENCODING); } public static final String convert(final InputStream in, final String encoding) throws IOException { final ByteArrayOutputStream out = new ByteArrayOutputStream(); final byte[] buf = new byte[2048]; int rd; while ((rd = in.read(buf, 0, 2048) >= 0) { out.write(buf, 0, rd); } return new String(out.toByteArray(), 0, encoding); } 
+7
source

I do not see how this could be much easier. I did it a little differently. If you already have String, you can do this:

 new String(originalString.getBytes(), "ISO-8859-1"); 

And something like this might work too:

 BufferedReader reader = new BufferedReader(new InputStreamReader(is)); StringBuilder sb = new StringBuilder(); String line = null; while ((line = reader.readLine()) != null) { sb.append(line + "\n"); } is.close(); return new String(sb.toString().getBytes(), "ISO-8859-1"); 

EDIT: I have to add, this is really just an alternative to your already working solution. When it comes to converting streams to Java, it won't be much easier, so go for it. :)

+1
source

If you don't want to weigh it yourself, you can see the apache commons io project, IOUtils.toString (InputStream input, String encoding) , which seems to do what you want. I have not tried this method myself, but the java document says: "Get the contents of the InputStream as a string using the specified character encoding."

0
source

Guava IO package is really good this way.

 Files.toString(yourFile, CharSets.ISO_8859_1) 

or from a stream

 new String(ByteStreams.toByteArray(stream), CharSets.ISO_8859_1) 
0
source

I just found out that this answer to the question Reading / converting an InputStream to a string can lead to my problem, see the code below. In any case, I really appreciate the answers you have given so far.

 private String convertStreamToString(InputStream is, String charsetName) { try { return new java.util.Scanner(is, charsetName).useDelimiter("\\A").next(); } catch (java.util.NoSuchElementException e) { return ""; } } 

So, to encode Latin-1, name it like this:

 String message = convertStreamToString(is, "8859_1"); 
0
source

All Articles