How does Docker calculate the hash of each layer? Is it deterministic?

Question

How does Docker calculate the hash of each layer? Is it deterministic?

I tried to find this information in the official Docker docs, but was not successful.

What parts of the Docker data do you consider when calculating the hash of each commit / level?

It's pretty obvious that the line in the Dockerfile is part of the hash and, of course, the parent hash is the token. But what else needs to be considered when calculating this hash?

Specific use case. Suppose I have two developers on different machines at different points in time (and because of this, different $ docker build ... daemons and different caches) working under $ docker build ... against the same Docker file. The FROM ... directive FROM ... will give them the same starting point, but will the hash result of each operation work on the same hash? Is it deterministic?

+6

docker commit dockerfile hash

Victor schröder Mar 31 '16 at 17:04

source share

1 answer

robrich · Answer 1 · 2018-01-09T04:14:26+0000

Thanks @thaJeztah. The answer is at https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#id-definitions-and-calculations

layer.DiffID : identifier for a single layer
Calculation: DiffID = SHA256hex (data of uncompressed tar level)
layer.ChainID : identifier for the layer and its parents. This identifier uniquely identifies a file system consisting of a set of layers.
Calculation:
- For the bottom layer: ChainID (layer0) = DiffID (layer0)
- For other layers: ChainID (layerN) = SHA256hex (ChainID (layerN-1) + "" + DiffID (layerN))
image.ID : image identifier. Because the image configuration refers to the layers used by their images, this identifier includes file system data and the rest of the image configuration.
Calculation: SHA256hex (imageConfigJSON)

How does Docker calculate the hash of each layer? Is it deterministic?

More articles: