DynamoDB - How to do incremental backups?

I use DynamoDB tables with keys and throughput optimized for using applications. To support other administrative and reporting use cases of ad hoc, I want to keep the full backup in S3 (daily backup is ok). Again, I cannot afford to scan all DynamoDB tables for backup. I donโ€™t have enough keys to find out what the โ€œnewโ€ is. How to do incremental backups? Do I need to modify my DynamoDB schema or add additional tables just for this? Any best practices?

Update : DynamoDB Streams solves this problem.

DynamoDB streams capture a time-ordered sequence of change positions in any DynamoDB table and store this information in a log for up to 24 hours. Applications can access this journal and view the data that appeared before and after their change in near real time.

+6
source share
5 answers

I see two options:

  • Create a current snapshot. You will need to read from the table to do this, which you can do very slowly to stay under performance limits ( Scan ). Then save the list of updates in memory over a period of time. You can put them on another table, but you will also have to read those that would probably cost the same. This time interval can be a minute, 10 minutes, an hour, no matter what you lose if your application exits. Then periodically capture the picture from S3, reproduce these changes in the picture and upload your new picture. I donโ€™t know how big your dataset is, so it can be impractical, but I saw how it was done with great success for datasets up to 1-2 GB.

  • Add bandwidth and data backup with a full scan every day. You say that you cannot afford it, but it is not clear whether you mean payment for bandwidth or that the scan will use its full capacity and the application will start to crash. The only way to pull data from DynamoDB is to read either strongly or ultimately sequentially. If the backup is part of your business requirements, I think you need to determine if it is worth it. You can grab your reading by examining the ConsumedCapacityUnits property according to your results. A Scan operation has a Limit property that you can use to limit the amount of data read in each operation. Scan also uses ultimately sequential reads, which are half the price of highly consistent reads.

+5
source

You can now use dynamoDB streams to store data in an anthother table or to store another copy of data in another data warehouse.

https://aws.amazon.com/blogs/aws/dynamodb-streams-preview/

+4
source

For incremental backups, you can link your DynamoDB stream using the lambda function to automatically run the code for each data update (Ie: data to another storage, for example S3)

A lambda function that you can use to communicate with DynamoDb for incremental backups:

https://github.com/PageUpPeopleOrg/dynamodb-replicator

I talked in detail about how you can use streams DynamoDB Streams, Lambda and S3 to create incremental backups of your data in DynamoDb on my blog:

https://www.abhayachauhan.com/category/aws/dynamodb/dynamodb-backups

Alternatively, DynamoDB has just backed up and restored on demand. They are not incremental, but completely backup snapshots.

For more information, please contact https://www.abhayachauhan.com/2017/12/dynamodb-scheduling-on-demand-backups/ .

NTN

+3
source

On November 29, 2017, on-demand backup was introduced. It allows you to back up directly to DynamoDB almost instantly, without consuming any capacity. Here are some snippets from the post:

This feature is designed to help you meet regulatory requirements for long-term archiving and data storage. You can back up with a click (or by calling an API) without using your bandwidth or affect the responsiveness of your application. Backups are stored for a very long time and can be used to create fresh tables.

...

Backup is available right now! It is encrypted using an Amazon-managed key and includes all table data, configured capacity settings, local and global secondary index settings, and streams. It does not include auto-scaling or TTL settings, tags, IAM policies, CloudWatch metrics, or CloudWatch alerts.

You might be wondering how this operation can be instantaneous, given that some of our clients have tables approaching half a petabyte. Behind the scenes, DynamoDB takes full snapshots and saves all change logs. Backing up is as easy as saving a timestamp along with the current metadata for the table.

+1
source

A scan operation in DynamoDB returns rows sorted by primary key (hash key). Therefore, if the hash key of the table is an automatically incrementing integer, then set the hash key of the last record saved during the previous backup, as the parameter "lastEvaluatedKey" to request a scan at the next backup, and the scan will return records that are created only since the last backup.

0
source

All Articles