Using Perl to read French characters from an Excel spreadsheet

I am using Spreadsheet::ParseExcel to parse an Excel spreadsheet file as follows

 my $FileName = "../excel.xls"; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse($FileName); 

And reading values ​​from such cells

 $product = $worksheeto->get_cell( $row, 0 )->value(); 

The problem is that if there is a French character like à , it shows ò

To make sure there is no error in the analysis that I used

 print unpack('H*', $product) . "\n"; 

So when I use any online hex to string converter, I get à .

I also tried

 use utf8; binmode(STDOUT, ":utf8"); 

but instead of à I get

Is there a way to get the correct characters?

+8
perl excel
source share
2 answers

Try parsing the file using formatting, for example Spreadsheet :: ParseExcel :: FmtUnicode :

 use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtUnicode; #use Spreadsheet::ParseExcel::FmtJapan; my $FileName = '../excel.xls'; my $parser = Spreadsheet::ParseExcel->new(); my $formatter = Spreadsheet::ParseExcel::FmtUnicode->new(); my $workbook = $parser->parse($FileName,$formatter); 

Try also FmtJapan, as the documentation is written : Formatting Spreadsheet :: ParseExcel :: FmtJapan also supports Unicode. If you encounter any problems with the default encoding, try this.

* UPDATE: I tried this on my own in the xls file with Greek characters, but it did not work with either FmtUnicode or FmtJapan. Then I found this perlmonks post , used the provided module My::Excel::FmtUTF8 and worked successfully when printing cell values ​​using $cell->value() .

+7
source share

I tried what you described and it works correctly here as soon as I turn on utf-8 output. I would suggest that you have a strange excel file (you have to place an example somewhere) or that your terminal is poorly configured.

Troubleshooting character set problems is difficult because your terminal may confuse me. Therefore, it is always useful to pass the output to "od -c" to find out what you get. In my script, I get this text from the table in which I was lying:

 Value = Descripción 

And when I pass it through od:

 0000000 V alue = D e 0000020 scripci 303 263 n \n 

I see that - two bytes long, which assumes UTF-8. To make sure, you can request iconv to convert from the expected output encoding to what you use in your terminal:

 iconv -f utf-8 

If the input is not correct utf-8, it will bark at you and / or output even stranger garbage.

+2
source share

All Articles