Can someone explain the difference between the content tracking used in Git and the file tracking used in other SCM

I have been using Git for a while and love the features and flexibility in the workflow that it allows. The ability to make early and often huge deals for me really fits into my way of working.

One feature of Git that I have heard many times, but it hasn’t occurred to me yet, is that it tracks content, not the history of files, which implies a much more efficient way of renaming and moving files.

Can someone explain why this is? I have not noticed anything special in this regard compared to SVN. What am I missing?

+7
source share
2 answers

Git stores three pieces of data separately :

  • content is stored in blob objects.
  • history is stored in commit objects Structure
  • stored in tree objects

The consequence of this is that if you have the same data in multiple files, git needs to be saved only once, because the structure (containing directories and files) should point to only one content object.

Similarly, if the file does not change from version to version, git should only save this file once. Several story objects point to the same content.

Some of the visible advantages of the user are that Git is to blame for very good seeing that the code moves through the files , especially if you tell him to look at the real hard one with git blame -C . These are also some of the reasons why git is so compact and fast, the structure is very simple, very cheap to walk around and it doesn't repeat itself.

One of the drawbacks is that git does not save copies of files and renames it, it just guesses, and sometimes it makes mistakes.

This blog post provides a decently well-understood, but still detailed discussion of what content tracking buys git. If you want to know more, you can watch Linus Technical Talk on Git or read the transcript .

+10
source

The only information that Git stores from one revision to the next is the state (names and contents) of the files for each revision. In revision A, this file had this content, and in revision B, this file had different content. Git doesn't care how files got from point A to point B, be it editing or renaming, or resolving a conflict, or merging an octopus.

This approach has the advantage of a conceptually simple repository format. This is important because your repository is your story, and the story should be saved in the simplest format.

One consequence of this is that whenever Git needs to find out what happened between versions A and B (for example), it needs to figure out the details at the time you ask for it. Even for a simple distinction, while some tools may just show the internally stored diff, Git compares the files in revision A and B and regenerates the diff on demand. For renaming, Git notices that a new file has appeared and is looking for similar files in the previous version to guess whether the file has been renamed or not.

As Git tools improve over time, you can report more about how the story was formed, without having to record them at that time. For example, it is often argued that Git can "track individual bits of code moving from one file to another." This is due to the cleverness of programs reporting history, and not because of anything stored in the repository itself.

+5
source

All Articles