I am trying to determine if Amazon SimpleDB is suitable for a subset of the data that I have.
I have thousands of deployed autonomous sensors that record data.
Each sensor device essentially reports several values four times per hour every day for several months and years. I need to save all this data for historical statistical analysis. As a rule, write it once, read many times. Server applications are launched regularly to request data to display other information.
Now the data rows in SQL look something like this:
- (id, device_id, utc_timestamp, value1, value2)
Our existing MySQL solution will not expand much further, with tens of millions of rows. We ask for things like "tell me the sum of the total value1 yesterday" or "show me the average value2 over the past 8 hours." We do this in SQL, but we can gladly change it in the code. SimpleDBs "possible sequence" is great for our belly buttons.
I read everything I can and am going to start experimenting with our AWS , but it is not clear to me how the various concepts of SimpleDB (elements, domains, attributes, etc.) relate to our domain.
Is SimpleDB the right tool for this, and what would be a generic approach?
PS: We mainly use Python, but it does not matter when considering this at a high level. At this point, I know the boto library.
Edit:
Continuing my search for solutions to this, I came up with a question about stack overflow. What is the best open source solution for storing time series data? which was helpful.
Aitch source share