Md5 all files in the directory tree

I have a directory with a structure like:

. β”œβ”€β”€ Test.txt β”œβ”€β”€ Test1 β”‚  β”œβ”€β”€ Test1.txt β”‚  β”œβ”€β”€ Test1_copy.txt β”‚  └── Test1a β”‚  β”œβ”€β”€ Test1a.txt β”‚  └── Test1a_copy.txt └── Test2 β”œβ”€β”€ Test2.txt β”œβ”€β”€ Test2_copy.txt └── Test2a β”œβ”€β”€ Test2a.txt └── Test2a_copy.txt 

I would like to create a bash script that does the md5 checksum for each file in this directory. I want to be able to enter the script name in the CLI, and then the path to the directory that I want to use, and make it work. I am sure there are many ways to do this. I currently have:

 #!/bin/bash for file in "$1" ; do md5 >> "${1}__checksums.md5" done 

It just freezes and doesn't work. Maybe I should use find?

One caveat - the directories I want to use will have files with different extensions and may not always have the same tree structure. I want something that will work in these different situations.

+12
bash find for-loop md5 directory-structure
source share
5 answers

Using md5deep

 md5deep -r path/to/dir > sums.md5 

Using find and md5sum

 find relative/path/to/dir -type f -exec md5sum {} + > sums.md5 

Keep in mind that when you start checking your MD5 amounts using md5sum -c sums.md5 you need to run it from the same directory from which you created the sums.md5 file. This is because find displays paths related to your current location, which are then placed in the sums.md5 file.

If this is a problem, you can make the absolute value relative/path/to/dir (e.g. by putting $PWD/ in front of your path). Thus, you can run the sums.md5 check from anywhere. The downside is that sums.md5 now contains absolute paths, which makes it bigger.

Full featured using find and md5sum

You can put this function in your .bashrc file (located in the $HOME directory):

 function md5sums { if [ "$#" -lt 1 ]; then echo -e "At least one parameter is expected\n" \ "Usage: md5sums [OPTIONS] dir" else local OUTPUT="checksums.md5" local CHECK=false local MD5SUM_OPTIONS="" while [[ $# > 1 ]]; do local key="$1" case $key in -c|--check) CHECK=true ;; -o|--output) OUTPUT=$2 shift ;; *) MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1" ;; esac shift done local DIR=$1 if [ -d "$DIR" ]; then # if $DIR directory exists cd $DIR # change to $DIR directory if [ "$CHECK" = true ]; then # if -c or --check option specified md5sum --check $MD5SUM_OPTIONS $OUTPUT # check MD5 sums in $OUTPUT file else # else find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file fi cd - > /dev/null # change to previous directory else cd $DIR # if $DIR doesn't exists, change to it to generate localized error message fi fi } 

After running source ~/.bashrc you can use md5sums as a regular command:

 md5sums path/to/dir 

will generate checksums.md5 file in the path/to/dir directory containing the MD5 sum of all the files in this directory and subdirectories. Using:

 md5sums -c path/to/dir 

to check the amounts from the path/to/dir/checksums.md5 file.

Note that path/to/dir can be relative or absolute, md5sums will work anyway. The resulting checksums.md5 file always contains paths relative to path/to/dir . You can use a different file name, and then defaults to checksums.md5 by specifying the -o or --output option. All parameters except -c , --check , -o and --output are passed to md5sum .

The first part of the md5sums function md5sums is responsible for parsing parameters. See this answer for more details. The second half contains explanatory comments.

+23
source share

What about:

find /path/you/need -type f -exec md5sum {} \; > checksums.md5

Update # 1: Improved team based on @twalberg recommendation to handle spaces in file names.

Update # 2: Improved based on @jil's suggestion to remove unnecessary xargs call and use -exec instead.

Update # 3: @ Run a naive implementation of your script to look something like this:

 #!/bin/bash # Usage: checksumchecker.sh <path> find "$1" -type f -exec md5sum {} \; > "$1"__checksums.md5 
+4
source share
 #!/bin/bash shopt -s globstar md5sum "$1"/** > "${1}__checksums.md5" 

Explanation: shopt -s globstar (manual) enables a ** recursive global template. This means that "$1"/** will expand to a list of all files recursive in the directory specified as parameter $1 . Then the script simply calls md5sum with this list of files as a parameter and > "${1}__checksums.md5" redirects the output to the file.

+1
source share

Updated Answer

If you like the answer below or any other, you can make a function that executes this command for you. So, to test it, enter the following into Terminal:

 function sumthem(){ find "$1" -type f -print0 | parallel -0 -X md5 > checksums.md5; } 

Then you can simply use:

 sumthem /Users/somebody/somewhere 

If it works like this, you can add this line to the end of your "bash profile" and the function will be declared and available whenever you log in. Your "bash profile" is probably in $HOME/.profile

Original answer

Why don't you run all your processor cores in parallel?

 find . -type f -print0 | parallel -0 -X md5sum 

This finds all the files ( -type f ) in the current directory ( . ) And prints them with a zero byte at the end. Then they are transferred to GNU Parallel , which is informed that the file names end with a zero byte ( -0 ) and that it should make as many files as possible at a time ( -X ) to save creating a new process for each file, and it should md5sum files.

This approach will pay the biggest bonus, in terms of speed, with large images such as Photoshop files.

+1
source share
 md5deep -r $your_directory | awk {'print $1'} | sort | md5sum | awk {'print $1'} 
+1
source share

All Articles