How to convert between ISO-8859-1 and UTF-8 in Java?

Question

How to convert between ISO-8859-1 and UTF-8 in Java?

Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and vice versa in Java?

I get a string from the Internet and save it in RMS (J2ME), but I want to save special characters and get a string from RMS, but with ISO-8859-1 encoding. How to do it?

+62

java utf-8 character-encoding java-me iso-8859-1

c4r1o5 Mar 16 '09 at 21:09

source share

8 answers

Who worked for me: ("üzüm bağları" is spelled correctly in Turkish)

Convert ISO-8859-1 to UTF-8:

 String encodedWithISO88591 = "Ã¼zÃ¼m baÄlarÄ±"; String decodedToUTF8 = new String(encodedWithISO88591.getBytes("ISO-8859-1"), "UTF-8"); //Result, decodedToUTF8 --> "üzüm bağları"

Convert UTF-8 to ISO-8859-1

 String encodedWithUTF8 = "üzüm bağları"; String decodedToISO88591 = new String(encodedWithUTF8.getBytes("UTF-8"), "ISO-8859-1"); //Result, decodedToISO88591 --> "Ã¼zÃ¼m baÄlarÄ±"

+8

Bahadir Tasdemir Aug 12 '16 at 8:45

source share

If you have a String , you can do this:

 String s = "test"; try { s.getBytes("UTF-8"); } catch(UnsupportedEncodingException uee) { uee.printStackTrace(); }

If you have a “broken” String , you did something wrong, converting String to String to another encoding does not meet the requirements! You can convert String to byte[] and vice versa (given the encoding). Java String contains AFAIK encoded using UTF-16 , but these are implementation details.

Say you have an InputStream , you can read in byte[] and then convert it to String using

 byte[] bs = ...; String s; try { s = new String(bs, encoding); } catch(UnsupportedEncodingException uee) { uee.printStackTrace(); }

or even better (thanks erickson) use InputStreamReader as follows:

 InputStreamReader isr; try { isr = new InputStreamReader(inputStream, encoding); } catch(UnsupportedEncodingException uee) { uee.printStackTrace(); }

+6

Johannes Weiss Mar 16 '09 at 21:30

source share

Here is a simple way with String output (I created a method for this):

 public static String (String input){ String output = ""; try { /* From ISO-8859-1 to UTF-8 */ output = new String(input.getBytes("ISO-8859-1"), "UTF-8"); /* From UTF-8 to ISO-8859-1 */ output = new String(input.getBytes("UTF-8"), "ISO-8859-1"); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return output; } // Example input = "Música"; output = "MÃºsica";

+3

JLeon90 Jun 13 '16 at 17:24

source share

Regex can also be good and efficiently used (replaces all UTF-8 characters not specified in ISO-8859-1 space):

 String input = "€Tes¶ti©ng [§] al€lo€fi¶t _ - À ÆÑ with some 9umbers as" + " w2921**#$%!@# well Ü, or ü, is a chaŒracte⚽"; String output = input.replaceAll("[^\\u0020-\\u007e\\u00a0-\\u00ff]", " "); System.out.println("Input = " + input); System.out.println("Output = " + output);

+1

Pritam Banerjee Nov 21 '18 at 17:43

source share

Apache Commons IO The Charsets class may come in handy:

 String utf8String = new String(org.apache.commons.io.Charsets.ISO_8859_1.encode(latinString).array())

0

Alberto Segura Apr 6 '17 at 13:03 on

source share

Here is the function to convert UNICODE (ISO_8859_1) to UTF-8

 public static String String_ISO_8859_1To_UTF_8(String strISO_8859_1) { final StringBuilder stringBuilder = new StringBuilder(); for (int i = 0; i < strISO_8859_1.length(); i++) { final char ch = strISO_8859_1.charAt(i); if (ch <= 127) { stringBuilder.append(ch); } else { stringBuilder.append(String.format("%02x", (int)ch)); } } String s = stringBuilder.toString(); int len = s.length(); byte[] data = new byte[len / 2]; for (int i = 0; i < len; i += 2) { data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4) + Character.digit(s.charAt(i+1), 16)); } String strUTF_8 =new String(data, StandardCharsets.UTF_8); return strUTF_8; }

TEST

 String strA_ISO_8859_1_i = new String("الغلاف".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1); System.out.println("ISO_8859_1 strA est = "+ strA_ISO_8859_1_i + "\n String_ISO_8859_1To_UTF_8 = " + String_ISO_8859_1To_UTF_8(strA_ISO_8859_1_i));

RESULT

ISO_8859_1 strA est = Ø§ÙØºÙØ§Ù String_ISO_8859_1To_UTF_8 = الغلاف

0

che.moor Oct 30 '18 at 14:52

source share

The way to convert from latin1 to utf-8 is pretty simple, as shown above. The return path is probably also simple, but with the problem we have up to 3-4 byte characters in utf-8, which is a problem for latin1, which just supports 1 byte. We must map each utf-8 character above 128 to the equivalent in latin1. I think the algorithm for this has not yet been implemented, but I will work on it during this week and will return with a solution next week (without incorrectly replaced characters).

Observed I just realized that the only characters that you can convert without conversion problems from Latin 1 (ISO-8859-1) to utf-8 are ascci (0 to 127). Other cases should also be covered by my method, which I will introduce next week. Lol ...

0

Moisés Ferreira Jul 24 '19 at 10:45

source share

erickson · Accepted Answer · 2009-03-16 22:21

In general, you cannot do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can only handle a small portion. Thus, transcoding from ISO-8859-1 to UTF-8 is not a problem. Switching from UTF-8 to ISO-8859-1 will result in the appearance of "replacement characters" (& # xFFFD;) in your text when unsupported characters are detected.

Recode Text:

byte[] latin1 = ... byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");

or

 byte[] utf8 = ... byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");

You can strengthen control using the lower-level Charset API. For example, you can throw an exception when an uncoded character is detected, or use a different character to replace the text.

How to convert between ISO-8859-1 and UTF-8 in Java?

More articles: