The right thing (probably) is to calculate hashes from time to time for each such directory:
import os
import hashlib
def sha1OfFile(filepath):
sha = hashlib.sha1()
with open(filepath, 'rb') as f:
while True:
block = f.read(2**10)
if not block: break
sha.update(block)
return sha.hexdigest()
def hash_dir(dir_path):
hashes = []
for path, dirs, files in os.walk(dir_path):
for file in sorted(files):
hashes.append(sha1OfFile(os.path.join(path, file)))
for dir in sorted(dirs):
hashes.append(hash_dir(os.path.join(path, dir)))
break
return str(hash(''.join(hashes)))
The problem of using only files in order os.walkgives you (for example, Markus) that you can get the same hash for different file structures containing the same files. For example, this deer hash
main_dir_1:
dir_1:
file_1
file_2
dir_2:
file_3
and this one
main_dir_2:
dir_1:
file_1
dir_2:
file_2
file_3
will be the same.
, , - , , .