Despite the fact that the Global Secondary Index meets your requirements, any attempt to include related timestamp information as part of your Hash Key will most likely create a so-called โhot sectionโ, which is highly undesirable.
Uneven access will occur because the most recent items will be received at a higher frequency than the old ones. This will not only affect your performance, but also make your decision less economical.
See the documentation for more details:
For example, if a table has a very small number of highly accessible partition key values, perhaps even one very heavily used partition key value, query traffic is concentrated on a small number of partitions - potentially only one partition. If the workload is largely unbalanced, which means that it is disproportionately focused on one or more partitions, the queries will not have a throughput level. To get the most out of DynamoDB throughput, create tables in which the partition key has a large number of different values, and the values โโare requested fairly evenly, as soon as possible.
In accordance with what is indicated, id seems to be a good choice for your Hash Key (aka. Partition Key ), I would not change it, since GSI keys work the same as sectioning. As a separate note, performance is highly optimized when you retrieve data, providing the entire Primary Key , so we should try to find a solution that provides this when possible.
I would suggest creating separate tables for storing primary keys, depending on how they were recently updated. You can segment the data in tables based on the granularity that is best for your use cases. For example, say you want to segment updates by day:
but. Your daily updates can be stored in tables with the following naming convention: updates_DDMM
b. The updates_DDMM tables would only have id (hash keys for another table)
Now say that the last update date of the application was from 2 days ago (04/07/16), and you need to get the latest entries, then you need:
I am. Scan the updates_0504 and updates_0604 to get all the hash keys.
II. Finally, get the entries from the main table (containing lat / lng, name, etc.) by sending a BatchGetItem with all the hash keys received.
BatchGetItem is super fast and does the job like no other operation.
It can be argued that creating additional tables will increase the cost of your overall solution ... well, with GSI you essentially duplicate your table (in case you design all the fields) and add that extra cost for all ~ 2k records, being recently updated or not ...
It seems like this is an intuitive table creation like this, but this is actually the best practice when dealing with time series data (from AWS DynamoDB documentation):
[...] applications can display an uneven access pattern for all elements in a table where the latest customer data is more relevant, and your application can access the latter more often and these items are less accessible, as a result, old elements rarely access them. If this is a known access pattern, you can take it when designing a table schema. Instead of storing all the items in one table, you can use several tables to store these items. For example, you can create tables to store monthly or weekly data. For a data storage table from the last month or week, where the data access speed is high, the query is higher than the throughput and for tables storing old data, you can gain throughput and save resources.
You can save resources by storing hot items in one table with higher throughput values โโand cold items in another table with lower bandwidth settings. You can remove old items by simply deleting tables. If necessary, you can copy these tables to other storage such as Amazon Simple Storage Service (Amazon S3). Deleting an entire table is much more efficient than deleting items one by one, which substantially doubles the write throughput, as you do as many delete operations as input operations.
Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
Hope this helps. Regards.