Character encoding in an Excel spreadsheet (and what Java charset to use for decoding)

Question

Character encoding in an Excel spreadsheet (and what Java charset to use for decoding)

I am using the JExcel library to read Excel spreadsheets. Each cell in the spreadsheet can contain localization strings in any of 44 languages (English, Portuguese, French, Chinese, etc.). Today I am not telling the API anything about the encoding it should use. His treatment of the Chinese is fine, but it always wraps Portuguese and German. Somehow, the default encoding (MacRoman in my dev block, UTF-8 in production) cannot correctly interpret the lines that it pulls from an Excel workbook. There must be something wrong with the way JExcel interprets the character encoding of the file.

It is said ...

Are all lines in an excel workbook encoded with the same character set?

Is there any book metadata, I may ask, what is this character set (I have not found it yet)?

If I run all the cells through something like jchardet (http://jchardet.sourceforge.net/), maybe he can guess the character encoding for the whole book (this largely depends on the first question, “yes, all the injections in this book encoded with the same character set ")?

So many questions, so little time.

+4

java excel character-encoding cp1252

Bob kuhar Sep 16 '11 at 19:07

source share

2 answers

I have a problem that while reading the cell values from the excel file, some values appeared with the symbol "?" as it matches letters with an accent ... Can this code solve this problem? ". Since I am running under windows, I cannot test as fast as if I were under Linux (which is the SO server on which I deploy) ...

0

Agustin Jun 19 '13 at 14:34

source share

Bob kuhar · Accepted Answer · 2011-09-17T01:05:52+0000

Well, I didn’t get a response directly, but Matt found a specification indicating the path to the real answer: http://sc.openoffice.org/excelfileformat.pdf

At the same time, my problem disappeared by simply setting the encoding always to “Cp1252”. I don’t know exactly why, but I don’t look at the horse with gifts in my mouth, so to speak, and continue.

WorkbookSettings workbookSettings = new WorkbookSettings(); workbookSettings.setEncoding( "Cp1252" ); Workbook.getWorkbook( theFile, workbookSettings );

I will call this.

Character encoding in an Excel spreadsheet (and what Java charset to use for decoding)

More articles: