How to find duplicate file names (recursively) in a given directory? Bash

Question

How to find duplicate file names (recursively) in a given directory? Bash

I need to find all duplicate file names in this tree. I don't know what the dir tree user will give as an argument to the script, so I don't know the directory hierarchy. I tried this:

#!/bin/sh find -type f | while IFS= read vo do echo `basename "$vo"` done

but that’s not quite what I want. It finds only one duplicate, and then ends, even if there are more duplicate file names, also - it does not print the entire path (prints only the file name) and duplicates the account. I wanted to do something like this command:

 find DIRNAME | tr '[AZ]' '[az]' | sort | uniq -c | grep -v " 1 "

but it works for me, I don’t know why. Even if I have duplicates, it doesn't print anything. I am using Xubuntu 12.04.

+4

bash filenames duplicates

yak Apr 29 '13 at 10:40

source share

6 answers

 #!/bin/sh dirname=/path/to/check find $dirname -type f | while read vo do echo `basename "$vo"` done | awk '{arr[$0]++; next} END{for (i in arr){if(arr[i]>1){print i}}}

+8

jim mcnamara Apr 29 '13 at 11:27

source share

Yes, this is a really old question. But all these loops and temporary files seem a bit cumbersome.

Here is my 1-line answer:

 find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate

It has its limitations due to uniq and sort :

no spaces (space, tab) in the file name (will be interpreted as a new uniq and sort field)
requires a file name printed as the last field, limited by a space ( uniq does not support comparison of only one field and inflexibility with field separators)

But it is quite flexible regarding its output thanks to find -printf and works well for me. It also seems that @yak tried to reach initially.

Demonstration of some parameters that you have:

 find /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend

There are also options in sort and uniq to ignore case (as a means of creating a topic that should have been achieved by laying through tr ). See them using man uniq or man sort .

+4

trs Aug 30 '17 at 23:39

source share

 #!/bin/bash file=`mktemp /tmp/duplicates.XXXXX` || { echo "Error creating tmp file"; exit 1; } find $1 -type f |sort > $file awk -F/ '{print tolower($NF)}' $file | uniq -c| awk '$1>1 { sub(/^[[:space:]]+[[:digit:]]+[[:space:]]+/,""); print }'| while read line; do grep -i "$line" $file; done rm $file

And it also works with spaces in file names. Here's a simple test (the first argument is a directory):

 ./duplicates.sh ./test ./test/2/INC 255286 ./test/INC 255286

+2

Elisiano petrini Apr 29 '13 at 11:51

source share

Only one find command:

 lst=$( find . -type f ) echo "$lst" | rev | cut -f 1 -d/ | rev | sort -f | uniq -i | while read f; do names=$( echo "$lst" | grep -i -- "/$f$" ) n=$( echo "$names" | wc -l ) [ $n -gt 1 ] && echo -e "Duplicates found ($n):\n$names" done

+1

Fabien bouleau Mar 08 '16 at 14:39

source share

This solution writes one temporary file to a temporary directory for each unique file name. In the temporary file, I write the path where I first found a unique file name, so that I can output it later. So, I create a lot more files that are other hosted solutions. But I realized that.

Below is a script called fndupe .

 #!/bin/bash # Create a temp directory to contain placeholder files. tmp_dir=`mktemp -d` # Get paths of files to test from standard input. while read p; do fname=$(basename "$p") tmp_path=$tmp_dir/$fname if [[ -e $tmp_path ]]; then q=`cat "$tmp_path"` echo "duplicate: $p" echo " first: $q" else echo $p > "$tmp_path" fi done exit

The following is an example of using a script.

 $ find . -name '*.tif' | fndupe

The following is sample output when the script finds duplicate file names.

 duplicate: a/b/extra/gobble.tif first: a/b/gobble.tif

Tested with Bash version: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

0

Mike finch Aug 05 '15 at 10:32

source share

psibar · Accepted Answer · 2013-04-29T12:14:56+0000

Here is another solution (based on @ jim-mcnamara's suggestion) without awk:

Solution 1

 #!/bin/sh dirname=/path/to/directory find $dirname -type f | sed 's_.*/__' | sort| uniq -d| while read fileName do find $dirname -type f | grep "$fileName" done

However, you need to do the same search twice. This can become very slow if you need to search a lot of data. Saving search results in a temporary file may give better performance.

Solution 2 (with temporary file)

 #!/bin/sh dirname=/path/to/directory tempfile=myTempfileName find $dirname -type f > $tempfile cat $tempfile | sed 's_.*/__' | sort | uniq -d| while read fileName do grep "$fileName" $tempfile done #rm -f tempfile

Since you may not want to write a temporary file to your hard drive in some cases, you can choose a method that suits your needs. Both examples print the full path to the file.

Bonus question here: Is it possible to save all the output of the find command as a list for a variable?

How to find duplicate file names (recursively) in a given directory? Bash

Here is my 1-line answer:

More articles: