Check files for equality

What is the most elegant way to check files for equality in Python? Check sum? Byte compares? Think that the files will not be more than 100-200 MB

+5
source share
6 answers

use hashlib to get the md5 of each file and compare the results.

#! /bin/env python
import hashlib
def filemd5(filename, block_size=2**20):
    f = open(filename)
    md5 = hashlib.md5()
    while True:
        data = f.read(block_size)
        if not data:
            break
        md5.update(data)
    f.close()
    return md5.digest()

if __name__ == "__main__":
    a = filemd5('/home/neo/todo')
    b = filemd5('/home/neo/todo2')
    print(a == b)

Update: In Python 2.1, there is a filecmp module that does exactly what you want, and has methods for comparing directories too. I never knew about this module, I am still learning Python itself :-)

>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst')
True
>>> filecmp.cmp('undoc.rst', 'index.rst')
False
+4
source

How about a module filecmp? It can compare files in different ways with various tradeoffs.

, :

http://docs.python.org/library/filecmp.html

+9

, .

, . , .

, , , .

.

+4

, os.path.getsize(...) . , .

, , .

:

def foo(f1, f2):
    if not os.path.getsize(f1) == os.path.getsize(f2):
        return False # Or similar

    ... # Checksumming / byte-comparing / whatever
0
source

I would perform a checksum with MD5 (for example) instead of byte copy plus date check and depended on the need to check the name.

-2
source

How about a wrap cmp?

import commands
status, output = commands.getstatusoutput("/usr/bin/cmp file1 file2")
if (status == 0):
  print "files are same"
elif (status == 1):
  print "files differ"
else:
  print "uh oh!"
-2
source

All Articles