Why does DynamoDB scanning with Limit and FilterExpression not return items matching the filter?

I need to do a constraint and condition scan for DynamoDB.

docs says:

In response, DynamoDB returns all relevant results within the Limit value. For example, if you issue a Query or Scan query with a limit value of 6 and without a filter expression, DynamoDB returns the first six elements in the table that match the specified key conditions in the query (or only the first six elements in the case of scanning without a filter). If you also specify a FilterExpression value, DynamoDB will return items in the first six that also match the filter requirements (the number of returned results will be less than or equal to 6).


Code (NODEJS):

var params = { ExpressionAttributeNames: {"#user": "User"}, ExpressionAttributeValues: {":user": parseInt(user.id)}, FilterExpression: "#user = :user and attribute_not_exists(Removed)", Limit: 2, TableName: "XXXX" }; DynamoDB.scan(params, function(err, data) { if (err) { dataToSend.message = "Unable to query. Error: " + err.message; } else if (data.Items.length == 0) { dataToSend.message = "No results were found."; } else { dataToSend.data = data.Items; console.log(dataToSend); } }); 



Definition table XXXX:

  • Section primary key: User (number)
  • Primary Sort Key: Identifier (String)
  • INDEX:
    • Index Name: RemovedIndex
    • Type: GSI
    • Section Key: Deleted (Number)
    • Sort key: -
    • Attributes: ALL


In the above code, if I remove the Limit parameter, DynamoDB will return the elements matching the filter requirements. So, the conditions are in order. But when I scan with the Limit parameter, the result is empty.

Table XXXX contains 5 items. Only the first 2 have the Removed attribute. When I scan without the Limit parameter, DynamoDB returns 3 elements without the Removed attribute.

What am I doing wrong?

+9
source share
5 answers

From the documents you indicated:

If you also specify a FilterExpression value, DynamoDB will return items in the first six that also match the filter requirements

By combining Limit and FilterExpression, you told DynamoDB to look only at the first two elements in the table and evaluate FilterExpression for these elements. A constraint in DynamoDB can be confusing because it works differently from limit in an SQL expression in an RDBMS.

+16
source

Also ran into this problem, I think you just need to scan the whole table up to 1 MB

Scan The result set from scanning is limited to 1 MB per call. You can use LastEvaluatedKey from the scan response to get more results.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

+1
source

You may be able to get what you need using the secondary index. Using the classic RDB example, a customer order example: you have one table for customers and one for orders. In the Orders table there is a Key consisting of Customer - HASH, Order - RANGE. Therefore, if you want to receive the last 10 orders, there would be no way to do this without scanning

But if you create a global secondary index for the “Some Constant” orders - HASH, Date RANGE and request against this index, they will do what you want and pay only for the RCUs associated with the returned records. No expensive scan is required. Note that the recordings will be more expensive, but in most cases they are much more than they write.

Now you have the original problem, if you want to get the 10 largest orders per day in excess of $ 1,000. The request will return the last 10 orders, and then filter out those that are less than $ 1,000.

In this case, you can create a computed Date-OrderAmount key, and queries on this index will return what you want.

It is not as simple as SQL, but you also need to think about access patterns in SQL. If, if you have a lot of data, you need to create indexes in SQL, or the database will be happy to scan tables on your behalf, which will degrade performance and increase your costs.

Note that everything that I propose is normalized in the sense that there is only one source of truth. You are not duplicating the data - you are just looking at it to get what you need from DynamoDB.

Keep in mind that CONSTANT is like a HASH with 10 GB per partition limit, so you will need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you might use Customer rather than a constant like HASH. Or use STreams to organize data (or subsets) in other ways.

+1
source

Ok, this is for demo use, and it will use here

0
source

Small hack - iterate until you get results

 lastEvaluatedKey = null; do { if(lastEvaluatedKey != null) { // query or scan data with last evaluated key } else { // query or scan data WITHOUT last evaluated key } lastEvaluatedKey == key of last item retrieved } while(lastEvaluatedKey != null && retrievedResultSize == 0); // == 0 or < yourLimit 

If the number of items found is 0, and lastEvaluatedKey is not equal to zero, it means that it scanned or requested the number of rows matching your limit. (and the size of the result is zero because they do not match the filter expression)

-1
source

All Articles