How to find duplicate file names (recursively) in a given directory? Bash

I need to find all duplicate file names in this tree. I don't know what the dir tree user will give as an argument to the script, so I don't know the directory hierarchy. I tried this:

#!/bin/sh find -type f | while IFS= read vo do echo `basename "$vo"` done 

but that’s not quite what I want. It finds only one duplicate, and then ends, even if there are more duplicate file names, also - it does not print the entire path (prints only the file name) and duplicates the account. I wanted to do something like this command:

 find DIRNAME | tr '[AZ]' '[az]' | sort | uniq -c | grep -v " 1 " 

but it works for me, I don’t know why. Even if I have duplicates, it doesn't print anything. I am using Xubuntu 12.04.

+4
source share
6 answers

Here is another solution (based on @ jim-mcnamara's suggestion) without awk:

Solution 1

 #!/bin/sh dirname=/path/to/directory find $dirname -type f | sed 's_.*/__' | sort| uniq -d| while read fileName do find $dirname -type f | grep "$fileName" done 

However, you need to do the same search twice. This can become very slow if you need to search a lot of data. Saving search results in a temporary file may give better performance.

Solution 2 (with temporary file)

 #!/bin/sh dirname=/path/to/directory tempfile=myTempfileName find $dirname -type f > $tempfile cat $tempfile | sed 's_.*/__' | sort | uniq -d| while read fileName do grep "$fileName" $tempfile done #rm -f tempfile 

Since you may not want to write a temporary file to your hard drive in some cases, you can choose a method that suits your needs. Both examples print the full path to the file.

Bonus question here: Is it possible to save all the output of the find command as a list for a variable?

+10
source
 #!/bin/sh dirname=/path/to/check find $dirname -type f | while read vo do echo `basename "$vo"` done | awk '{arr[$0]++; next} END{for (i in arr){if(arr[i]>1){print i}}} 
+8
source

Yes, this is a really old question. But all these loops and temporary files seem a bit cumbersome.

Here is my 1-line answer:

 find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate 

It has its limitations due to uniq and sort :

  • no spaces (space, tab) in the file name (will be interpreted as a new uniq and sort field)
  • requires a file name printed as the last field, limited by a space ( uniq does not support comparison of only one field and inflexibility with field separators)

But it is quite flexible regarding its output thanks to find -printf and works well for me. It also seems that @yak tried to reach initially.

Demonstration of some parameters that you have:

 find /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend 

There are also options in sort and uniq to ignore case (as a means of creating a topic that should have been achieved by laying through tr ). See them using man uniq or man sort .

+4
source
 #!/bin/bash file=`mktemp /tmp/duplicates.XXXXX` || { echo "Error creating tmp file"; exit 1; } find $1 -type f |sort > $file awk -F/ '{print tolower($NF)}' $file | uniq -c| awk '$1>1 { sub(/^[[:space:]]+[[:digit:]]+[[:space:]]+/,""); print }'| while read line; do grep -i "$line" $file; done rm $file 

And it also works with spaces in file names. Here's a simple test (the first argument is a directory):

 ./duplicates.sh ./test ./test/2/INC 255286 ./test/INC 255286 
+2
source

Only one find command:

 lst=$( find . -type f ) echo "$lst" | rev | cut -f 1 -d/ | rev | sort -f | uniq -i | while read f; do names=$( echo "$lst" | grep -i -- "/$f$" ) n=$( echo "$names" | wc -l ) [ $n -gt 1 ] && echo -e "Duplicates found ($n):\n$names" done 
+1
source

This solution writes one temporary file to a temporary directory for each unique file name. In the temporary file, I write the path where I first found a unique file name, so that I can output it later. So, I create a lot more files that are other hosted solutions. But I realized that.

Below is a script called fndupe .

 #!/bin/bash # Create a temp directory to contain placeholder files. tmp_dir=`mktemp -d` # Get paths of files to test from standard input. while read p; do fname=$(basename "$p") tmp_path=$tmp_dir/$fname if [[ -e $tmp_path ]]; then q=`cat "$tmp_path"` echo "duplicate: $p" echo " first: $q" else echo $p > "$tmp_path" fi done exit 

The following is an example of using a script.

 $ find . -name '*.tif' | fndupe 

The following is sample output when the script finds duplicate file names.

 duplicate: a/b/extra/gobble.tif first: a/b/gobble.tif 

Tested with Bash version: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

0
source

All Articles