The benefits of categorizing objects in folders named as the first two characters of the SHA-1 string?

Git stores objects in categorized folders using the first two characters of the SHA-1 string of the object, what is the advantage of this storage structure?

I think that he cannot avoid any potential conflict, why not put all the objects in a flat folder?

+4
source share
2 answers

These are several reasons to store files using the following method:

00/f56c0de1c61fdb926e79e8a0a65bd12930c9
25/ec1c55bfb660548a6770238668c4b117d92f
5d/4b01d98f17a9ad9dd1526b49ba39b5aa37a1
63/6f740b6c284ce6685dc17d473a7360ace249
b1/066d178188dde110149a8422ab651b0ee615
b1/a2b7d02b7b0c43530677ab06235382a37e20
da/a3ee5e6b4b0d3255bfef95601890afd80709

The main reason is that you can limit the number of files that you can store in a folder: some (rather old) file systems do not allow you to store more than 64 thousand files inside a directory. This is a pretty small amount if you state everything that git stores.

, , , : ( , )

, , ( 100%)

+3

( , Git Internal - Packfiles) , Git.

, (, , ), Git ):

SHA-1 , SHA-1 .

SHA1 , () .
256 .

SHA-1 :

  • , .

. gitrepository-layout:

objects/[0-9a-f][0-9a-f]

.
256 , sha1, .
, , ( ) .

Git commit 88520ca , gc:

4 , gc

Windows git-gui , 1 .git/objects/42 ( 8 ).
, , 100 , 32%.
, 4- , 8%, .

script m*q, q .git/objects, n - .

( (CDF) binocdf)

q = 4;
m = [1 2 8];
n = 0:10:2000;

P = zeros(length(n), length(m));
for k = 1:length(n)
        P(k, :) = 1-binocdf(q*m-1, n(k), q/(256-q));
end
plot(n, P);

n \ q   1       4
50      18%     1%
100     32%     8%
200     54%     39%
500     86%     96%

Git , git gc .

+2

All Articles