How do you programmatically compare the contents of two archive files?

I am doing some testing to ensure that everything in one zip file that I created using a script file will produce the same result as the contents of several zip files that I have to manually click and create via the web interface, Therefore, the zip will be have a different folder structure.

Of course, I can manually extract them and use my powerful eyeball technique to scan them or even more lazy, I can write a script to do this, but before I invest more and blame my boss for robbery during the company, I ask Is there a better way to do this?

I am using perl LAMP stack. thanks.

+4
source share
4 answers

You can use perl Archive :: ZIP or Python zipfile to extract the file names, sizes and checksums of CRC files in archives. Create a file containing results sorted by file name (ignore the path).

For your small ZIP files, merge the results of the script ( cat list1 list2 list3 | sort ).

Now you can use diff to compare the results.

+3
source

I can fully recommend Beyond Compare . If you really do not get underpayment, this is the biggest blow for your (bosses) dollar.

[Edit] It seems I looked at a different folder structure, sorry for that. Beyond Compare can compare all files in folders with the same folder structure. He does not (I think) have the intelligence to look for matches in files in different folders.

Yours faithfully,
Livny

+1
source

Taking a cue from Carra's answer ... if A.zip is your only large archive and B.zip is an archive generated over the Internet, then use the following algorithm

  • Extract all files from A.zip and recursively (wrt folders) calculate the checksum of the files present in the folder (using cksum , md5sum , etc.) where the contents were extracted, and save this information after sorting it (pass it through sort ) to a file (e.g. A.txt)

  • Do the same for B.zip and generate B.txt

  • Compare A.txt with B.txt, they should be exactly the same.

OR

Use unzip -l to get the list of files / directories for archives (zip), then smooth the hierarchy of the user-created zip file and compare the contents of your zip script file using something like diff . Smoothing the hierarchy means that you may need to do a precession on one or both lists before you can make a meaningful comparison with diff .

+1
source

Create a crc checksum for your files.

If your checksum is the same for the source files and the unpacked files, you can be sure that the files are the same. And even works for non-textual data.

You can easily create a checksum using an external program such as SFV Checker or programmatically (for example .net / java, including libraries).

+1
source

All Articles