Creating a unique hash for a directory in python

Question

Creating a unique hash for a directory in python

I would like to create a unique hash in python for this directory. Thanks to zmo for the code below to generate a hash for each file in a directory, but how can I combine them to create one hash to represent a folder?

import os
import hashlib

def sha1OfFile(filepath):
    sha = hashlib.sha1()
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block: break
            sha.update(block)
        return sha.hexdigest()

for (path, dirs, files) in os.walk('.'):
  for file in files:
    print('{}: {}'.format(os.path.join(path, file),       
sha1OfFile(os.path.join(path, file)))

+4

python hash

iheartcpp Mar 24 '16 at 15:40

source share

2 answers

sha-.

import os
import hashlib

def update_sha(filepath, sha):
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block:
                break
            sha.update(block)

for (path, dirs, files) in os.walk('.'):
    sha = hashlib.sha1()
    for file in files:
        fullpath = os.path.join(path, file)
        update_sha(fullpath, sha)

    print(sha.hexdigest())

- .

+1

Markus 24 . '16 16:16

Ilya Peterov · Accepted Answer · 2016-03-24T16:22:58+0000

The right thing (probably) is to calculate hashes from time to time for each such directory:

import os
import hashlib

def sha1OfFile(filepath):
    sha = hashlib.sha1()
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block: break
            sha.update(block)
        return sha.hexdigest()

def hash_dir(dir_path):
    hashes = []
    for path, dirs, files in os.walk(dir_path):
        for file in sorted(files): # we sort to guarantee that files will always go in the same order
            hashes.append(sha1OfFile(os.path.join(path, file)))
        for dir in sorted(dirs): # we sort to guarantee that dirs will always go in the same order
            hashes.append(hash_dir(os.path.join(path, dir)))
        break # we only need one iteration - to get files and dirs in current directory
    return str(hash(''.join(hashes)))

The problem of using only files in order os.walkgives you (for example, Markus) that you can get the same hash for different file structures containing the same files. For example, this deer hash

main_dir_1:
    dir_1:
        file_1
        file_2
    dir_2:
        file_3

and this one

main_dir_2:
    dir_1:
        file_1
    dir_2:
        file_2
        file_3

will be the same.

, , - , , .

Creating a unique hash for a directory in python

More articles: