How to use aggregate functions in Amazon Dynamodb

Question

How to use aggregate functions in Amazon Dynamodb

Am New to dynamodb I have a table in DynamoDB with over 100,000 items in it. In addition, this table is frequently updated. In this table, I want to be able to do something similar to this in the world of relationship databases: how can I get the maximum value from the table.

+8

node.js amazon-dynamodb

pranay Apr 26 '16 at 13:48

source share

3 answers

Jaredhatfield · Answer 1 · 2016-04-27T00:09:50+0000

DynamoDB is a NoSQL database and is therefore very limited in how you can query data. It is not possible to perform aggregations such as the maximum value from the table by directly invoking the DynamoDB API. You will have to look for various tools and approaches to solve this problem.

There are several possible solutions:

Run a table scan

With over 100k line items in the table, this is probably a very bad idea. A table scan will read every single element, and you can have application logic that determines the maximum value. This is really not an acceptable solution.

Materialized Index in DynamoDB

Depending on your use case, you can use DynamoDB streams and the Lambda function to maintain the index in a separate DynamoDB table. If your table is intended only for recording, there are no updates and exceptions, you can save the maximum in a separate table and as you add new records you can compare them and perform the necessary updates.

This approach works in some limited conditions, but is not a generalized solution.

Perform analysis using Amazon Redshift

DynamoDB is not designed to perform analytical operations such as maximum, while Redshift is a very powerful big data platform that can easily perform these types of calculations. Like the DynamoDB index, you can use DynamoDB streams to send data to Redshift as records are inserted to maintain a real-time copy of the table for analytical purposes.

If you are looking for a more autonomous or analytical solution, this is a good choice.

Perform analytics using Elasticsearch

While DynamoDB is a powerful NoSQL solution with reliable guarantees of data longevity, Elasticsearch offers a very flexible query method that allows you to query such queries as the maximum, and these aggregations can be sliced and diced according to any attribute value in real time. Like the solutions above, you can use DynamoDB streams to send updates and delete records to the Elasticsearch index in real time.

If you want to stick with DynamoDB, but you need additional features for queries, this is a really good option, especially when using AWS ES, which will fully manage the Elasticsearch cluster for you. It is important to remember that Elasticsearch does not replace your DynamoDB table, it is just a searchable index of the same data.

Just use the SQL database

The obvious solution is if you have SQL requirements, then switch from a NoSQL-based system to an SQL-based system. AWS RDS offers a manageable solution. While DynamoDB provides many benefits if your use case draws you to an SQL solution, the easiest thing to do is not to fight it and just change decisions.

This does not mean that a SQL-based solution or a NoSQL-based solution is better, there are pros and cons for each, and they vary depending on the specific use case, but it is definitely an option to consider.

gvasquez · Answer 2 · 2017-05-25T14:33:37+0000

DynamoDB really has a MAX summary function: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Querying.html

kwadhwa · Answer 3 · 2019-05-10T22:48:57+0000

In response to Jared's answer here fooobar.com/questions/1004796 / ... there are several more ways to perform aggregation in AWS DynamoDB, but you need to export the data to another service.

Perform analytics using S3 + Athena:

Download data from DynamoDB to Amazon S3, and then use a service such as Amazon Athena to query it. You can use AWS Glue to perform the ETL process and create a full copy of the DynamoDB table in S3. The main disadvantage of this method is that data cannot be requested in real time or almost in real time. The output of all DynamoDB content may take several minutes before it becomes available for analytical queries.

Perform analytics using Rockset :

Rockset is a fully managed search and analytics service. Rockset has live integration with DynamoDB, which you can use to synchronize data between DynamoDB and Rockset. Rockset creates several indexes and allows you to use full SQL for aggregation with a delay of milliseconds on large amounts of data. Here you will find information on how to configure this: https://rockset.com/blog/running-fast-sql-on-dynamodb-tables/

Disclosure: I work in the @Rockset engineering team.

How to use aggregate functions in Amazon Dynamodb

More articles: