Expand OpenOffice files for better storage in version control

I heard a discussion about how OpenOffice (ODF) files are compressed zip files for XML and other data. Therefore, making minor changes to the file can potentially completely change the data, so delta compression does not work in version control systems.

I did some basic testing in an OpenOffice file, unzipping it and then rearranging it with zero compression. I used the zip Linux utility for testing. OpenOffice will happily open it anyway.

So I'm wondering if it's worth developing a small utility to work in ODF files every time before I start using version control. Any thoughts on this idea? Possible alternatives?

Secondly, what would be a good and reliable way to implement this small utility? Bash shell that calls zip (maybe only Linux)? Python Can you think of any problems? Obviously, I don't want to accidentally distort the file, and there are several ways that can happen.

Possible errors that I can think of:

  • Not enough disk space
  • Some other permissions issues that prevent writing a file or temporary files
  • The ODF document is encrypted (perhaps it should just leave it alone, which probably also leads to large file changes and thus prevents effective delta compression)
+15
version-control
Jun 10 '09 at 12:01
source share
6 answers

Here is the Python script I put together. So far, he has had minimal testing. I did some basic testing in Python 2.6. But I prefer the idea of ​​Python in general, because it should be interrupted with an exception if any error occurs, while the bash script may not be.

It is first checked that the input file is valid and not yet compressed. Then it copies the input file to a “backup” file with the extension “.bak”. Then it unpacks the source file, overwriting it.

I am sure there are things that I forgot. Please feel free to give feedback.

#!/usr/bin/python # Note, written for Python 2.6 import sys import shutil import zipfile # Get a single command-line argument containing filename commandlineFileName = sys.argv[1] backupFileName = commandlineFileName + ".bak" inFileName = backupFileName outFileName = commandlineFileName checkFilename = commandlineFileName # Check input file # First, check it is valid (not corrupted) checkZipFile = zipfile.ZipFile(checkFilename) checkZipFile.testzip() # Second, check that it not already uncompressed isCompressed = False for fileObject in checkZipFile.infolist(): if fileObject.compress_type != zipfile.ZIP_STORED: isCompressed = True if isCompressed == False: raise Exception("File is already uncompressed") checkZipFile.close() # Copy to "backup" file and use that as the input shutil.copy(commandlineFileName, backupFileName) inputZipFile = zipfile.ZipFile(inFileName) outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED) # Copy each input file data to output, making sure it uncompressed for fileObject in inputZipFile.infolist(): fileData = inputZipFile.read(fileObject) outFileObject = fileObject outFileObject.compress_type = zipfile.ZIP_STORED outputZipFile.writestr(outFileObject, fileData) outputZipFile.close() 

This is in the Mercurial repository at BitBucket .

+1
Jun 13 '09 at 14:08
source share

Firstly, the version control system that you want to use must support the hooks that are called in order to convert the file from version to repository to the workspace, for example, the cleanup / blur filters in Git from gitattributes .

Secondly, you can find such a filter instead of writing it yourself, for example, rezip from Office opendocument (openoffice.org) in the git section of the Git mailing list (but see the warning in the Followup section : managing OO files - warning about "rezip") ,

You can also browse answers in " Tracking OpenOffice files / other compressed files with Git " thread, or try to find the answer inside " [PATCH 2/2] Add keyword unexpansion support to convert.c " thread.

Hope this helps

+13
Jun 10 '09 at 14:23
source share

You might consider storing documents in FODT format - a flat XML format.
This is a relatively new alternative.

The document is simply unpacked.

Additional information is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion .

+4
Mar 10 '15 at 4:19
source share

I changed the python program a bit in Craig McQueen . Changes include:

  • Actually checking the return of testZip (according to the docs, it seems that the source program will gladly continue the damaged zip file step by step checkzip).

  • Rewrite the for-loop to verify that already uncompressed files are one if statement.

Here is the new program:

 #!/usr/bin/python # Note, written for Python 2.6 import sys import shutil import zipfile # Get a single command-line argument containing filename commandlineFileName = sys.argv[1] backupFileName = commandlineFileName + ".bak" inFileName = backupFileName outFileName = commandlineFileName checkFilename = commandlineFileName # Check input file # First, check it is valid (not corrupted) checkZipFile = zipfile.ZipFile(checkFilename) if checkZipFile.testzip() is not None: raise Exception("Zip file is corrupted") # Second, check that it not already uncompressed if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()): raise Exception("File is already uncompressed") checkZipFile.close() # Copy to "backup" file and use that as the input shutil.copy(commandlineFileName, backupFileName) inputZipFile = zipfile.ZipFile(inFileName) outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED) # Copy each input file data to output, making sure it uncompressed for fileObject in inputZipFile.infolist(): fileData = inputZipFile.read(fileObject) outFileObject = fileObject outFileObject.compress_type = zipfile.ZIP_STORED outputZipFile.writestr(outFileObject, fileData) outputZipFile.close() 
+3
Mar 06 '10 at 19:47
source share

Here's another program I stumbled upon: store_zippies_uncompressed by Mirko Friedenhagen.

The wiki also shows how to integrate it with Mercurial.

+2
Mar 16 '10 at 7:43
source share

If you don’t need saving on storage, but you just want to have OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page , which explains how to make oodiff a standard diff for OpenDocument formats under git and mercurial. (He also mentions SVN, but it was so long ago that I regularly used SVN. I'm not sure if these are instructions or restrictions.)

(I found this using Mirko Friedenhagen's page (quoted by Craig McQueen above))

0
Jul 15 '12 at 1:22
source share



All Articles