Can GIT, Mercurial, SVN, or other version control tools work well when the project tree has binary files?

Sometimes our project tree can have binaries such as jpg, png, doc, xls or pdf. Can GIT, Mercurial, SVN or other tools do a good job when only part of the binary is changed?

For example, if the specification is written in .doc, and it is part of the repository, then if it is 4 MB and edited 100 times, but only for 1 or 2 lines and checked 100 times during the year, then it is 400 MB.

If it's 100 different .doc and .xls files, then it's 40 GB ... not the size that is easy to manage.

I tried GIT and Mercurial and see that both of them seem to add a lot of data, even when 1 line is changed in .doc or .pdf. Is there any other way inside GIT or Mercurial or SVN that can do the job?

+9
git dvcs svn mercurial binaryfiles
Jun 06 2018-10-06T00:
source share
5 answers

In general, version control systems work better with text files. The whole merge / conflict concept is really based on the source code. However, SVN works very well for binary files. (We use it for CAD drawings.)

I will point out that file locking (svn: needs-lock) is pretty much mandatory if multiple users are working with a shared binary. Without blocking files, 2 people can simultaneously work with a binary file. First, someone makes their changes. Guess what happens to a person who has not committed. All this binary / inoperative work that they did is virtually lost. A file lock serializes work on a file. You are losing the “parallel” capabilities of accessing the version control system, but you still have the advantages of commit logging, reverting to a previous version, etc.

The TortoieSVN client is smart enough to use the MS Word built-in merge tool to delimit the doc / docx file. It also has configuration options that let you specify alternative diff tools based on the file extension, which is pretty cool. (It's a shame that no one made a diff tool for our CAD package).

Current generation DVCS, such as Git or Hg, usually suck with binary files. They do not have a file locking mechanism.

+13
Jun 06 2018-10-06T00:
source share

There are binary diff tools, but they don’t really help, because changing one pixel of an image or changing one character in a Word document does not correspond to changing one byte in a file, due to compression. Thus, “pleasant” processing of such binary data is not possible.

If you want to commit such documents, consider making uncompressed options: RTF instead of DOC, TeX instead of PDF, etc. If the version control system uses compression to compress its internal repository, then this method should work pretty well. For example, in Git ,

Newly added objects are stored entirely using zlib compression.

EDIT: I just wanted to notice that even RTF is terrible, but not as terrible as the DOC. If you can switch to TXT or TeX for your documents, that would be better.

+5
Jun 06 '10 at 9:07
source share

See the Mercury binary wiki page . The main problem is that even minor changes to files, such as doc and others, will lead to large changes in the file structure (partly due to the fact that they were archived).

Therefore, I do not think that you will find a good way to process these files in the version control system.

+3
Jun 06 2018-10-06T00:
source share

I use git to synchronize my documents between Mac, Linux, and Windows computers. I had to do one redesign to avoid the 2 GB file limit on Windows. In total, it is about 7 Gbps in 3 repositories that are regularly synchronized. At some point, I even had a remote copy on a hosted server on the Internet.

Now I almost never need to clone these repositories, so the large size does not bother many. I also see that .git does not increase significantly and remains at the level of 40-60% of the size of checked documents, PDF files, excel sheets.

Changing a line in a doc ot pdf file changes dramatically in a file as formatting effects pulsate. Similarly, changing a cell in an XLS file can change many other cells.

However, compared to the alternative to the absence of documents under version control, I am happy to live with compression ratios of less than stars

+3
Jun 06 '10 at 9:18
source share

IMHO, you should stop using SCM to manage such documents. You should use special tools like Alfresco (I'm sure there are many other document management tools).

+1
Jun 07 '10 at 13:18
source share



All Articles