RSync (archive) that changes every time

I am working on an open source backup utility that backs up files and transfers them to various external locations such as Amazon S3, Rackspace Cloud Files, Dropbox, and remote servers via FTP / SFTP / SCP.

Now I received a function request for creating incremental backups (in case the backups made are large and become expensive to transfer and store). I looked around and someone mentioned the rsync utility. I conducted several tests with this, but I'm not sure if this is suitable, so I would like to hear from anyone who has experience with rsync .

Let me briefly talk about what happens when you back up. Basically, it will start dumping databases such as MySQL, PostgreSQL, MongoDB, Redis. This may take several ordinary files (such as images) from the file system. As soon as everything is in place, it will bind it all in one .tar (additionally compresses and encrypts it using gzip and openssl ).

Once all this is done, we have one file that looks like this:
mybackup.tar.gz.enc

Now I want to transfer this file to a remote location. The goal is to reduce throughput and storage costs. Therefore, suppose this small backup package is 1GB size. Therefore, we use rsync to transfer this to a remote location and locally delete the backup file. Tomorrow a new backup file will be created, and it turns out that a lot more data has been added in the last 24 hours, and we create a new file mybackup.tar.gz.enc , and it looks like we are up to 1.2GB size.

Now, my question is: is it possible to transfer only 200MB that has been added in the last 24 hours? I tried the following command:

rsync -vhP --append mybackup.tar.gz.enc backups/mybackup.tar.gz.enc

Result:

mybackup.tar.gz.enc 1.20G 100% 36.69MB / s 0:00:46 (xfer # 1, to-check = 0/1)

200.01M bytes sent
849.40K bytes received
8.14 MB / s
total size 1.20G
acceleration 2.01

Looking at sent 200.01M bytes , I would say that adding data works correctly. Now I wonder if he transferred the entire 1.2GB to find out how much and what to add to the existing backup, or is it really possible to transfer 200MB ? Because if he transferred the integer 1.2GB , then I don’t see how much it differs from using the scp utility for single large files.

Also, if what I'm trying to accomplish is perhaps what flags do you recommend? If this is not possible with rsync , is there any utility you can recommend using?

Any feedback is greatly appreciated!

+6
remote-server backup rsync
source share
3 answers

He sent only what he said sent - transferring only changed parts is one of the main features of rsync . It uses some pretty smart checksum algorithms (and it sends these checksums over the network, but it is negligible - a few orders of magnitude less data than transferring the file itself, in your case I would assume that .01 is 200.01M ) and only transfers those parts that he needs.

Note that quite powerful rsync-based backup tools already exist, namely Duplicity . Depending on the license of your code, it might be worth a look how they do it.

+6
source share

The nature of gzip is such that small changes in the source file can lead to very large changes in the resulting compressed file - gzip will make its own decisions each time about the best way to compress the data you give it.

Some versions of gzip have a --rsyncable switch that sets the size of the block that gzip works with the same value as rsync, which results in slightly less efficient compression (in most cases), but limits the changes in the output file of the same area output file as the source file changes.

If this is not available to you, then rsync is usually best uncompressed file (using native compression rsync, if bandwidth is a consideration) and compress at the end (if disk space is a consideration). Obviously, it depends on the features of your use case.

+8
source share

The new rsync -append WILL BREAK the contents of your file if there is any change in your existing data. (Starting from version 3.0.0)

+1
source share

All Articles