DynamoDB query by date

Question

DynamoDB query by date

I come from a relational database and try to work with amazon DynamoDB

I have a table with a hash key "DataID" and a range of "CreatedAt" and a bunch of elements in it.

I am trying to get all items created after a specific date and sorted by date. It is quite simple in a relational database.

In DynamoDB, the closest thing I could find was querying and using a range key more than a filter. The only problem is that to execute the request I need a hash key that defeats the target.

So what am I doing wrong? Is my table schema incorrect, should the hash key be unique? or is there any other way to query?

+55

amazon-web-services nosql amazon-dynamodb

applechief Feb 12 '13 at 16:04

source share

7 answers

Given your current table structure, this is not currently possible in DynamoDB. The huge task is to understand that the hash key of a table (partition) should be considered as creating separate tables. In a sense, this is really powerful (think of partition keys as creating a new table for each user or client, etc.).

Requests can only be performed in one section. This is truly the end of the story. This means that if you want to execute a query by date (you want to use msec from the era), then all the elements that you want to receive in one query must have the same Hash (section).

I have to qualify it. You can absolutely scan by the criterion you are looking for, that is not a problem, but that means that you will look at each row in your table and then check whether this row has a date that matches your parameters. It is really expensive, especially if you are storing events by date in the first place (i.e. you have many rows).

You may be tempted to put all the data in one section to solve the problem, and you absolutely can, but your throughput will be painfully low, given that each section receives only a fraction of the total amount set.

The best thing to do is to identify more useful sections to create for saving data:

Do you really need to look at all the lines or is it just the lines of a specific user?
Is it possible to first narrow the list by month and make several queries (one for each month)? Or Year?
If you are analyzing time series, there are several options, change the partition key to something calculated on PUT to make query easier, or use another aws product, such as kinesis, which can only be registered in the application.

+11

Warren Parad Aug 05 '16 at 13:09 on

source share

Your Hash key (primary sort) must be unique (unless you have a range, as indicated by others). A.

In your case, you must have a secondary index to query your table.

 | ID | DataID | Created | Data | |------+--------+---------+------| | hash | xxxxx | 1234567 | blah |

Your hash key is the identifier Your secondary index is defined as: DataID-Created-index (the name that DynamoDB will use)

Then you can make a request like this:

 var params = { TableName: "Table", IndexName: "DataID-Created-index", KeyConditionExpression: "DataID = :v_ID AND Created > :v_created", ExpressionAttributeValues: {":v_ID": {S: "some_id"}, ":v_created": {N: "timestamp"} }, ProjectionExpression: "ID, DataID, Created, Data" }; ddb.query(params, function(err, data) { if (err) console.log(err); else { data.Items.sort(function(a, b) { return parseFloat(a.Created.N) - parseFloat(b.Created.N); }); // More code here } });

Essentially, your request looks like this:

 SELECT * FROM TABLE WHERE DataID = "some_id" AND Created > timestamp;

The secondary index will increase the required read / write units, so you need to consider this. This is still much better than doing a scan, which will be expensive to read and in time (and limited to 100 points that I consider).

This may not be the best way to do this, but for those used for RD (I'm also used to SQL), this is the fastest way to get performance. Since there are no restrictions on the scheme, you can hack something that works, and once you have the bandwidth to work in the most efficient way, you can make a difference.

+4

ET Jul 02 '15 at 18:53

source share

The approach I took to solve this problem is to create a global secondary index, as shown below. I’m not sure if this is the best approach, but I hope if it is useful to someone.

 Hash Key | Range Key ------------------------------------ Date value of CreatedAt | CreatedAt

The restriction imposed on the HTTP API user is to specify the number of days to retrieve data, by default - 24 hours.

That way, I can always specify HashKey as the current day, and RangeKey can use the> and <operators when retrieving. Thus, the data also extends to several fragments.

+4

Gireesh Sep 26 '15 at 1:06 on

source share

You can make the hash key something along the lines of the “product category” identifier, and then the range key as a combination of a timestamp with a unique identifier added to the end. So you know the hash key and can request a date with more than.

+3

greg Feb 12 '13 at 18:32

source share

You may have several identical hash keys; but only if you have a range key that changes. Think of it as file formats; you can have 2 files with the same name in the same folder if their format is different. If their format matches, their name must be different. The same concept applies to hash / range keys DynamoDB; just think of a hash as a name and a range as a format.

Also, I don’t remember if they had them during the OP (I don’t think they did), but now they offer local secondary indexes.

My understanding is that now it should allow you to fulfill the required queries without the need for a full scan. The disadvantage is that these indices must be specified when creating the table, and also (I believe) cannot be empty when creating the element. In addition, they require additional bandwidth (although this is usually not as much as scanning) and storage, so this is not an ideal solution, but a viable alternative for some.

I still recommend answering Mike Brant as the preferred method of using DynamoDB; and use this method yourself. In my case, I just have a central table with only a hash key as my identifier, and then secondary tables that have a hash and a range that can be requested, then the element points to the central table "object of interest" directly.

Additional data on secondary indexes can be found in the Amazon DynamoDB documentation here for interested parties.

In any case, I hope this helps someone else what is happening in this thread.

+1

DGolberg Feb 13 '14 at 22:38

source share

You can do it now in DynamoDB using GSI. Create a “CreatedAt” field as a GSI and specify queries such as (GT some_date). Save the date as a number (ms from an era) for such requests.

Details are available here: Global secondary indexes - Amazon DynamoDB: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Using

This is a very powerful feature. Keep in mind that the request is limited (EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN) State - Amazon DynamoDB: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html

-6

Sony Kadavan Feb 14 '14 at 18:02

source share

Mike Brant · Accepted Answer · 2013-02-12 17:18

Updated answer:

DynamoDB allows you to specify secondary indexes to help with such a query. Secondary indexes can be global, which means that the index covers the entire table by hash keys or the local meaning that the index will exist in each section of the hash key, thereby also requiring the hash key to be specified when executing the query.

For the use case in this question, you would like to use the global secondary index in the "CreatedAt" field.

For more information on DynamoDB secondary indexes, see the secondary index documentation.

Original answer:

DynamoDB does not allow indexing of queries only on a range key. A hash key is required for the service to know which section to look for in order to find data.

You can, of course, perform a scan operation to filter by date value, however, this will require a full table scan, so it is not ideal.

If you need to perform an indexed search of records over time through several primary keys, DynamoDB may not be the ideal service for you, or you may need to use a separate table (either in DynamoDB or in relational storage) to store object metadata with which you can perform an indexed search.

DynamoDB query by date

More articles: