Using Spreadsheet :: ParseExcel in Perl But You Need Help

I have a Perl program using Spreadsheet :: ParseExcel. However, two difficulties arose that arose because I could not understand how to solve them. The script for the program is as follows:

#!/usr/bin/perl use strict; use warnings; use Spreadsheet::ParseExcel; use WordNet::Similarity::lesk; use WordNet::QueryData; my $wn = WordNet::QueryData->new(); my $lesk = WordNet::Similarity::lesk->new($wn); my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse ( 'input.xls' ); if ( !defined $workbook ) { die $parser->error(), ".\n"; } WORKSHEET: for my $worksheet ( $workbook->worksheets() ) { my $sheetname = $worksheet->get_name(); my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); my $target_col; my $response_col; # Skip worksheet if it doesn't contain data if ( $row_min > $row_max ) { warn "\tWorksheet $sheetname doesn't contain data. \n"; next WORKSHEET; } # Check for column headers COLUMN: for my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row_min, $col ); next COLUMN unless $cell; $target_col = $col if $cell->value() eq 'Target'; $response_col = $col if $cell->value() eq 'Response'; } if ( defined $target_col && defined $response_col ) { ROW: for my $row ( $row_min + 1 .. $row_max ) { my $target_cell = $worksheet->get_cell( $row, $target_col); my $response_cell = $worksheet->get_cell( $row, $response_col); if ( defined $target_cell && defined $response_cell ) { my $target = $target_cell->value(); my $response = $response_cell->value(); my $value = $lesk->getRelatedness( $target, $response ); print "Worksheet = $sheetname\n"; print "Row = $row\n"; print "Target = $target\n"; print "Response = $response\n"; print "Relatedness = $value\n"; } else { warn "\tWroksheet $sheetname, Row = $row doesn't contain target and response data.\n"; next ROW; } } } else { warn "\tWorksheet $sheetname: Didn't find Target and Response headings.\n"; next WORKSHEET; } } 

So my two problems:

First of all, sometimes the program returns the error "No Excel data found in the file", even if there is data. Each Excel file is formatted the same way. There is only one sheet with columns A and B labeled "Target" and "Response", respectively, with a list of words below them. However, it does NOT ALWAYS return this error. It works for one Excel file, but it does not work for the other, although both are formatted in exactly the same way (and yes, they are both the same file type). I can not find any reason for him not reading the second file, because it is identical to the first. The only difference is that the second file was created using an Excel macro; however, why does it matter? The file types and format are exactly the same.

Secondly, the variables '$ target' and '$ response' must be formatted as strings for the expression 'my $ value' to work. How to convert them to string format? The value assigned to each variable is a word from the corresponding cell in the Excel spreadsheet. I don't know which format (and for Perl there is no explicit way to check).

Any suggestions?

+4
source share
3 answers

In connection with your first question, the error "lack of data" indicates some problems with the file format. I saw this error with pseudo-Excel files such as Html or CSV files with the xls extension. I also saw this error with malformed files created by third-party applications.

You can perform an initial file check by dumping the hexdump / xxd of the working and non-working files and seeing if the general structure is approximately the same (for example, if it has similar magic numbers at the beginning and isn 't Html).

This may be a problem with the :: ParseExcel table. I am a supporter of this module. If you like, you can send me the “good” and “bad” file, the email address in the documents, and I will look at them.

+3
source

First of all, if you get "data not found", you can thank the proprietary Excel data file formats and the inability even of a good Perl library to extract information from them.

I highly recommend that you export Excel data to something that is easily parsed, such as CSV, especially given the simple nature of the data layout you described. There may be a way to get Excel to process the package, but I have no idea. A quick search gave a tool to use OpenOffice for batch conversion .

The rest of your question is pretty controversial once you agree that Excel data files will not play well.

0
source

I wrote this code after the client was not able to decide whether the XLS that he sent every week was really in XLS format or just CSV .... HTH!

 sub testForXLS () { my ( $FileName ) = @_; my $signature = ''; my $XLSsignature = 'D0CF11E0A1B11AE10000'; open(FILE, "<$FileName")||die; read(FILE, $buffer, 10, 0); close(FILE); foreach (split(//, $buffer)) { $signature .= sprintf("%02x", ord($_)); } $signature =~ tr/az/AZ/; if ( $signature eq $XLSsignature ) { return 1; } else { return 0; } } 
0
source

All Articles