Bash scripting de-dupe

I have a shell script. The cron task runs it once a day. At the moment, it simply downloads the file from the Internet using wget, adds a timestamp to the file name, and then compresses it. The main things.

This file does not change very often, so I want to drop the downloaded file if it already exists.

The easiest way to do this?

Thank!

+5
source share
4 answers

Do you really need to compress the file?
wgetprovides -N, --timestamping, which obviously includes time-stamping. What does this mean, say your file is located at www.example.com/file.txt

For the first time:

$ wget -N www.example.com/file.txt
[...]
[...] file.txt saved [..size..]

Next time it will be as follows:

$ wget -N www.example.com/file.txt
Server file no newer than local file "file.txt" -- not retrieving.

, .

, .
, , , / . , ? , ? ? txt ? ?

, .


, -, sha256 xz (lzma2).
- ( Bash):

newfilesum="$(wget -q www.example.com/file.txt -O- | tee file.txt | sha256sum)"
oldfilesum="$(xzcat file.txt.xz | sha256sum)"
if [[ $newfilesum != $oldfilesum ]]; then
    xz -f file.txt # overwrite with the new compressed data
else
    rm file.txt
fi

;

+5

. , , md5sum. MD5, , .

, , , . - / ( expires) . , Web 2.0.

+1

" " ?

, myfile myfile-[date], . , lastfile, myfile-[date]. , script, , lastfile .

, , , .

0
source

You can compare the new file with the last using the sum command . This takes the checksum of the file. If both files have the same checksum, they are very, very likely to be the same. There's another command called md5 that takes the imprint of md5, but the command sumis on all systems.

0
source

All Articles