Bash scripting de-dupe

Question

Bash scripting de-dupe

I have a shell script. The cron task runs it once a day. At the moment, it simply downloads the file from the Internet using wget, adds a timestamp to the file name, and then compresses it. The main things.

This file does not change very often, so I want to drop the downloaded file if it already exists.

The easiest way to do this?

Thank!

+5

bash shell deduplication

aidan Jun 12 '11 at 14:10

source share

4 answers

. , , md5sum. MD5, , .

, , , . - / ( expires) . , Web 2.0.

+1

Diego Sevilla 12 . '11 14:23

" " ?

, myfile myfile-[date], . , lastfile, myfile-[date]. , script, , lastfile .

, , , .

0

Ryan Leonard 12 . '11 14:20

source share

You can compare the new file with the last using the sum command . This takes the checksum of the file. If both files have the same checksum, they are very, very likely to be the same. There's another command called md5 that takes the imprint of md5, but the command sumis on all systems.

0

David W. Jun 12 '11 at 14:26

source share

c00kiemon5ter · Accepted Answer · 2011-06-12T14:39:36+0000

Do you really need to compress the file?
wgetprovides -N, --timestamping, which obviously includes time-stamping. What does this mean, say your file is located at www.example.com/file.txt

For the first time:

$ wget -N www.example.com/file.txt
[...]
[...] file.txt saved [..size..]

Next time it will be as follows:

$ wget -N www.example.com/file.txt
Server file no newer than local file "file.txt" -- not retrieving.

, .

, .
, , , / . , ? , ? ? txt ? ?

, .

, -, sha256 xz (lzma2).
- ( Bash):

newfilesum="$(wget -q www.example.com/file.txt -O- | tee file.txt | sha256sum)"
oldfilesum="$(xzcat file.txt.xz | sha256sum)"
if [[ $newfilesum != $oldfilesum ]]; then
    xz -f file.txt # overwrite with the new compressed data
else
    rm file.txt
fi

;

Bash scripting de-dupe

More articles: