How to better manage version of a document in Excel files and SQL schema files

I am responsible for several Excel files and SQL schema files. How can I improve version control of documents in these files? I need to know the modified part (another part) in this file and save all versions for reference. I am currently adding a timestamp for the file name, but I found that it is inefficient.

Is there a way or good practice to improve document version control?

By the way, editors send me files by email.

+61
git version-control versioning ms-office
Jun 13 '13 at 9:22
source share
7 answers

Since you tagged your git question, I assume you are asking about using git for this.

Well, SQL dumps are just plain text files, so it makes sense to track them with git . Just create a repository and save them in it. When you get a new version of a file, just overwrite it and commit, git will determine everything for you, and you can see the dates of changes, check the specific versions of this file and compare different versions.

The same is true for .xlsx if you unpack them. .xlsx files are archived by XML file directories (see How to properly assemble a valid xlsx file from its internal subcomponents? ). Git will treat them as binary if not unpacked. You can unzip .xlsx and track changes to individual XML files within the archive.

You can also do this with .xls files, but the problem is that the .xls format is binary, so you cannot get significant differences from it. But you can still see the change history and check for specific versions.

+26
Jun 13 '13 at 9:51 on
source share

In this case, you can apply the answer that I wrote here . A tool called xls2txt can provide human-readable output from xls files. In short, you should put this in your .gitattributes file:

 *.xls diff=xls 

And in .git / config:

 [diff "xls"] binary = true textconv = /path/to/xls2txt 

Of course, I'm sure you can find similar tools for other types of files, making git diff very useful tool for office documents. This is what I have in my global .gitconfig:

 [diff "xls"] binary = true textconv = /usr/bin/py_xls2txt [diff "pdf"] binary = true textconv = /usr/bin/pdf2txt [diff "doc"] binary = true textconv = /usr/bin/catdoc [diff "docx"] binary = true textconv = /usr/bin/docx2txt 

The Pro Git book has a good chapter on the topic: http://git-scm.com/book/en/Customizing-Git-Git-Attributes#Binary-Files

+63
Jun 14. '13 at 10:06 on
source share

I have been struggling with this exact problem in the last few days and have written a small .NET utility to extract and normalize Excel files in such a way that they are much easier to store in the source control. I published the executable here:

https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe

.. and the source is here:

https://bitbucket.org/htilabs/ooxmlunpack

If you have any interest, I will gladly make it more customizable, but for now you should put the executable in a folder (for example, the root of your source repository), and when you run it, it will be:

  • Scan the folder and its subfolders for any .xlsx and .xlsm files
  • Take a copy of the file as * .orig
  • Unzip each file and re-write it without compression
  • It’s enough to print any files in the archive that are valid XML
  • Delete the calcchain.xml file from the archive (since it changes a lot and does not affect the contents of the file)
  • Enter any unformatted text values ​​(otherwise, they are saved in the lookup table, which causes large changes in the internal XML if even one cell changes)
  • Delete values ​​from any cells containing formulas (since you can simply calculate them the next time you open the sheet)
  • Create a * .extracted subfolder containing the extracted contents of the zip archive

Obviously, not all of these things are necessary, but the end result is a spreadsheet file that will still be open in Excel, but which is much more susceptible to different and incremental compression. In addition, storing the extracted files also makes it much more obvious in the version history which changes were applied in each version.

If you have any appetite, I’m happy to make the tool more customizable, as I think that not everyone wants the contents to be extracted, or perhaps the values ​​removed from the formula cells, but they are both very useful to me at the moment.

In tests, a 2 MB table is “unpacked” up to 21 MB, but then I was able to save its five versions with slight changes between them, in the 1.9 MB mercury data file and visualize the differences between the versions, effectively using Beyond Compare in text mode.

nb although I am using Mercurial, I am reading this question exploring my solution, and there is nothing that may be in terms of merchandise in the solution, should work fine for git or any other vcs

+20
Jun 10 '14 at 16:12
source share

Tante greatly simplified the work on managing ZIP-based file formats in git :

Open the ~ / .gitconfig file (create it if it does not already exist) and add the following stanza:

[diff "zip"]

 textconv = unzip -c -a 
+2
Feb 06 '17 at 21:32
source share

As mentioned in another answer of the answer, .xlsx files are just XML.

To go to the XML directory (which is git -able), you need to "unzip" the .xlsx file into a directory. A quick way to see this in windows is to rename the .xlsx file to .zip and you will see the internal contents. I would save this along with the binary so that when checking you do not need to take other steps to open the document in excel.

+1
Jun 13 '13 at 11:29
source share

My approach with Excel files is similar to Jon's, but instead of working with raw Excel text data, I export to more friendly formats.

Here is the tool I'm using: https://github.com/stenci/ExcelToGit/tree/master

All you need to do is download the .xlsm file (click on the “View Raw” link on this page .) Remember to check Excel as described in readme. You can also add code to export SQL data to text files.

The book is both a converter from binary Excel to text files, and a tool launcher for Git tools, and can also be used with projects not related to Excel.

My working version is set up with dozens of Excel workbooks. I also use this file to open Git-gui for projects without Excel by simply adding the Git folder manually.

0
Oct 10 '15 at 23:27
source share

This Excel utility works very well for me:

Version Control for Excel

This is a fairly simple version control tool for VBA workbooks and macros. Once you commit the version, it is saved in the Git repository on your PC. Never tried it. SQL, but I'm sure there is a way.

0
Apr 12 '16 at 17:24
source share



All Articles