How does my Perl script determine if an Excel file is in XLS or XLSX format?

I have a Perl script that reads data from an Excel binary ( xls ) file. But the client who sends us these files started sending us XLSX files from time to time. I updated the script to read them. However, the client sometimes likes to name XLSX files with the .xls extension, which currently confuses the heck outta my script, because it uses the file name to determine what type of file it is.

An XLSX file is a zip file containing XML material. Is there an easy way for my script to look at the file and indicate if it is a zip file or not? If so, I can make my script available, and not just the file name.

+6
perl excel file-format zip
source share
7 answers
Files

.xlsx have the first 2 bytes as "PK", so a simple opening and examination of the first two characters will be performed.

+16
source share

Yes, it’s possible by checking the magic number .

Perl has many modules for checking the magic number in a file.

An example of using File :: LibMagic :

 use strict; use warnings; use File::LibMagic; my $lm = File::LibMagic->new(); if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) { # XLSX format } elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) { # XLS format } 

Another example: File :: Type :

 use strict; use warnings; use File::Type; my $ft = File::Type->new(); if ( $ft->mime_type($file) eq 'application/zip' ) { # XLSX format } else { # probably XLS format } 
+17
source share

Edit: Archive :: Zip is better

 solution # Read a Zip file my $somezip = Archive::Zip->new(); unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) { die 'read error'; } 
+6
source share

Use File::Type :

 my $file = "foo.zip"; my $filetype = File::Type->new( ); if( $filetype->mime_type( $file ) eq 'application/zip' ) { # File is a zip archive. ... } 

I just tested it with a .xlsx file, and mime_type() returned application/zip . Similarly, there is application/octet-stream for the .xls mime_type() file.

+2
source share

You can detect the xls file by checking the first bytes of the file for Excel headers.

You can get a list of valid old Excel headers here (if you don’t know the exact version of your Excel, check all applicable options):

http://toorcon.techpathways.com/uploads/headersig.txt


Header headers are described here: http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers but I'm not sure that .xlsx files have the same headers.

The logic of File :: Type looks like "PK \ 003 \ 004" as the file header for selecting zip files ... but I'm not sure that this logic will work until .xlsx without having a file to test.

+1
source share
 The-Evil-MacBook:~ ivucica$ file --mime-type --brief file.zip application/zip 

Therefore, possibly comparing

 `file --mime-type --brief $filename` 

with application/zip will do the zips detection trick. Of course, you need to install file , which is quite normal on UNIX systems. I'm afraid I cannot provide an example of Perl, since all Perl knowledge has evaporated from my memory, and I have no examples.

-one
source share

I can not say about Perl, but with the .Net infrastructure used there are several libraries available that will manipulate zip files that you could use.

Another thing I've seen is using the WinZip command line. It gives a return value of 0 when the file is unpacked and not equal to zero when an error occurs.

This may not be the best way to do this, but start.

-2
source share

All Articles