Svn or mercury version control of text documents

As far as I know, Microsoft went to a peculiar xml-based presentation in its latest version of the office. If this is true, then I would suggest that version control will work, although you will obviously have to allow any built-in changes with the old

<<<<<< ====== >>>>>> 

notes in them before downloading the word.

This other question mentions a problem, but it seems to be accepted that version control just won't work in Word, and I want to know why?

Is version control (i.e. Subversion) used in document tracking?

+4
source share
7 answers

There is a zipdoc extension for Mercurial that seems to process compressed files, such as XML-based Word documents, saving them without compression inside to get meaningful deltas and in order to combine them in a meaningful way. I have not tested it, but it looks like what you are looking for.

+7
source

The previous conclusion is that although most, if not all, of the version control systems included in Mercurial do work with binary files, they suck and distinguish between them.

Word files are binary. Yes, recent incarnations of Office have switched to the "Office Open XML" format, which includes XML, but they still transfer the whole thing to a zip file, which means it's still binary (and yes, I know that all the files are in a binary fact, you know what I mean.)

Now many version control systems, both Mercurial and Subversion, can be told how to combine any type of file that it considers binary, providing it with an external merge tool that can do the job.

This basically means that if you can find a program that can take two Word files, distinguish them and let you reconcile the differences, then you are in business.

If you unzipped the Word file and updated the content, yes, you can get merge conflicts that you can resolve through Mercurial, but the content will still be in a format that you yourself don’t write, so merge conflicts can be not just complicated, they can be impossible.

In short, version control systems excel at storing binary files, but they suck and distinguish between them.

If you never need to delimit or merge, you can use Mercurial or Subversion or something else, and that will work just fine.

+3
source

The new formats are actually XML-based, but the .docx file itself is actually a zip file. So ultimately it is still a binary ...

+2
source

I suppose it depends on who will use the documents. Usually, only developers are comfortable using VCS, so you can complicate the lives of people who just want to access through a shared drive.

On the other hand, the history of changes is often very important, and I often see documents with large summaries at the top, listing all the changes that seem really silly.

I think cloud solutions like google docs are likely to fill that gap in the future. Or maybe just a wiki. As a rule, you trade some features of a more convenient word in order to have a more open exchange experience, but google docs are becoming quite powerful.

+1
source

I would put Use Case in the foreground. Many people in the world need tools to compare two versions of the same Word document, but they are not developers, but, for example, lawyers. For my law firm clients, documents go out to their clients and return with changes, so a document-based comparison is absolutely necessary. They use either the built-in word comparison function or third-party tools (WorkShare DeltaView is a bit of an industry standard). These tools also allow you to compare PDF documents.

The use case here is clearly focused on content: lawyers need to quickly familiarize themselves with the differences between the two versions of the contract. Both versions can be saved as “versions” in the document management system, or in the case of DeltaView, the delta file can be saved for further consideration.

What could be the use case for the developer? Version control systems mean "SOURCE" and not "control everything that is included in my project." I prefer to store project-related documents (plans, specifications, requirements, emails) in a different store than in Mercurial. - On the other hand, I often use Word documents or Word templates as part of the solution in Document Template projects, and, of course, these documents are sources - therefore they are saved in the repo. But the need to visualize the differences so far has been relatively small, especially if your comments are good ("Version 1 - init", "Version 2: added text field in the header", "Version 3: added footer information", etc.) .

+1
source

Answers to various points or assumptions read here:

  • Yes, subversion does a great job with various binaries . For example, 60 versions of a 30 MB file take 90 MB for one of my documents with a lot of photos.
  • Yes, Tortoise SVN automatically calls the native MSWord diff and, thus, allows you to see the exact differences (including formats) between any two versions at the character level.
  • Consider using msWord Track Changes functions instead of matching posterio, it will also track moves, save authors, etc. Answers different needs ...
  • Yes, the docx file is a zipped directory with xml files. Try it, just open the docx file with a zip utility or unzip it!
  • Consider saving to XML instead of docx if you want to expand the keyword:

  • Save the file as .xml instead of .docx; although your file gets a lot bigger (no longer zips up), you can save space with svn compression, more text-efficient than binary files, I expect.

  • Insert your snv keywords (e.g. $ Rev $) in the document document properties (using File-Info, Properties in the right pane)
  • Display information in a document using fields: Isert-Quick Parts-Document Property, for example

It looks like me.

Rodolph

+1
source

Depends on the setting.

If this is a short document that you want to track, use Word's internal control.

Otherwise, use SVN or Sharepoint or some other external means of recording versions of documents. If you do not, you run the risk of someone overwriting the file with all the lost version information.

0
source

All Articles