So, Iβll talk a little about the topic and explain how Git stores what. This will explain what information is stored, and what exactly matters to the size of the repository. As a fair warning: this answer is quite long :)
Git objects
Git is essentially a database of objects. These objects come in four different types, and they are all identified by the SHA1 hash of their contents. Four types are drops, trees, commits, and tags.
Blob
Blub is the simplest type of object. It saves the contents of the file. Therefore, for every file content stored in your Git repository, there is one blob in the object database. Since it only stores the contents of a file, not metadata, such as file names, it is also a mechanism that prevents multiple files with the same contents from being stored multiple times.
Wood
Going one level up, a tree is an object that places drops in a directory structure. A single tree corresponds to one directory. This is essentially a list of files and subdirectories, each entry contains file mode, the name of a file or directory, and a link to the Git object that belongs to this entry. For subdirectories, this link points to a tree object that describes the subdirectory; for files, this link points to a blob that stores the contents of the file.
Commit
Blobs and trees are already enough to represent a complete file system. To add to them more truly, we have commit objects. Commit objects are created whenever you do something in Git. Each commit is a snapshot of the change history.
It contains a link to a tree object that describes the root directory of the repository. It also means that every commit that actually introduces some changes at least requires a new tree object (most likely more).
A commit also contains a link to its parent commit. Although usually only one parent exists (for linear history), a commit can have any number of parents, in which case it is usually called a merge commit. Most workflows only make you merge with two parents, but you can really have any other number.
Finally, the commit also contains the metadata that you expect from the commit: Author and committer (name and time) and, of course, the commit message.
This is all that is needed for a complete version control system; but of course there is another type of object:
Tag
Tag objects are one way to store tags. To be precise, tag objects store annotated tags, which are tags that have, like commits, some meta information. They are created using git tag -a (or when creating a signed tag) and require a tag message. They also contain a reference to the commit object that they point to, and a tagger (name and time).
References
So far, we have a complete version control system with annotated tags, but all of our objects are identified using their SHA1 hashes. This, of course, is a little annoying, so we have something else to make it easier: Links.
Links are provided in different versions, but the most important thing in them: these are simple text files containing 40 characters - the SHA1 hash of the object to which they point. Since they are so simple, they are very cheap, so working with many links is not a problem. This does not create overhead, and there is no reason not to use them.
There are usually three types of links: branches, tags, and remote branches. They really work the same way, and they are all designed to fix objects; with the exception of annotated tags that point to tag objects (regular tags just link to links). The difference between them is how you create them and in which subpath /refs/ they are saved. I will not talk about this, although this is explained in almost every Git tutorial; just remember: links, i.e. branches are extremely cheap, so feel free to create them for almost everyone.
Compression
Now, since torrek mentioned something about Git s compression in his answer, I want to clarify this a bit. Unfortunately, he mixed up a little.
So, usually for new repositories, all Git objects are stored in .git/objects as files identified by their SHA1 hash. The first two characters are removed from the file name and used to split the files into several folders, so it becomes easier for them to move around.
At some point, when the story gets bigger or when it starts up with something else, Git will begin to compress objects. This is done by packing several objects into one package file. How it works exactly is not really that important; this will reduce the number of individual Git objects and effectively store them in single, indexed archives (for now, Git will use the delta compression bit.). Then the pack files are saved in .git/objects/pack and can easily receive several hundred megabytes.
For reference, the situation is somewhat similar, although much simpler. All current links are stored in .git/refs , for example. branches in .git/refs/heads , tags in .git/refs/tags and remote branches in .git/refs/remotes/<remote> . As mentioned above, these are simple text files containing only the 40-character identifier of the object they are pointing to.
At some point, Git will move the old links of any type into a single search file: .git/packed-refs . This file is just a long list of hashes and reference names, one entry per line. Links that are stored there are removed from the refs directory.
Reflogs
Torek also mentioned these issues. They track what happens to the links. If you do anything that affects the link (commit, checkout, reset, etc.), then a new log entry is added to record what happened. It also provides an opportunity to return after you have done something wrong. For example, a common use case is accessing a reflog after accidentally dropping a branch to where it should not go. Then you can use git reflog to view the log and see where the link was pointing earlier. Since free Git objects are not deleted immediately (objects that are part of the story are never deleted), you can usually easily restore the previous situation.
Reflogs, however, are local: they only track what happens to your local repository. They are not transmitted with the remote control and are never transmitted. The newly cloned repository will have a single-entry loglog, which is a cloning action. They are also limited to a specific length, after which older activities are trimmed, so they will not become a storage problem.
Some final words
So, back to your current question. When you clone a repository, Git will usually already receive the repository in packaged format. This has already been done to save transmission time. Links are very cheap, so they are never the cause of large repositories. However, due to the nature of Git, one current commit object has an entire acyclic graph in it that ultimately reaches the very first commit, the very first tree and the very first blob. Thus, the repository will always contain all the information for all versions. This is what makes repositories with a big story big. Unfortunately, there is really nothing you can do about it. Well, in some part, you could cut off the old story, but that will leave you with a broken repository (you do this by cloning with the --depth ).
And as for your second question, as I explained above, branches are just links to commits, and links are just pointers to Git objects. No, there is no metadata about branches that you can get from them. The only thing that can give you an idea is the first commit you made when you fell off in your story. But the presence of branches does not automatically mean that there really is a branch stored in the history (fast merging and reloading works against it), and only because the history has a branch in the history, which does not mean that the branch (link, pointer) .