How to convert a custom encoded file to UTF-8 (in Java or using a special tool)

Question

How to convert a custom encoded file to UTF-8 (in Java or using a special tool)

The outdated software that I rewrite in Java uses custom (like Win-1252) encoding as a data store. For the new system I am creating, I would like to replace it with UTF-8.

Therefore, I need to convert these files to UTF-8 to feed my database. I know that a character map is used, but this is not one of the widely known. For example. "A" is at position 0x0041 (as in Win-1252), but at 0x0042 there is a character that appears in UTF-8 at position 0x0102, etc. Is there an easy way to decode and convert these files with Java?

I already read a lot of posts, but they all related to standard encodings of industry standards, and not custom ones. I expect that you can create a custom java.nio.ByteBuffer.CharsetDecoder or java.nio.charset.Charset to pass it to java.io.InputStreamReader , as described in the first answer here ?

Any suggestions are welcome.

+6

java encoding character-encoding

mmm Jan 20 '11 at 8:03

source share

1 answer

irreputable · Accepted Answer · 2011-01-20T08:14:05+0000

No need to be complicated. just create an array of 256 characters

 static char[] map = { ... 'A', '\u0102', ... }

then

 read each byte b in source int index = (0xff) & b; // to make it unsigned char c = map[index]; target.write( c );

How to convert a custom encoded file to UTF-8 (in Java or using a special tool)

More articles: