A few comments:
In addition to storing data, you can also see how you want to receive data, as this can significantly change the design. Some of the questions you can ask yourself are:
- When I receive data, will I always retrieve data for a specific metric and for a date / time range?
- Or do I need to get data for all indicators for a specific date / time range? If so, you are viewing a full table scan. Obviously, you could avoid this by running multiple queries (single query / PartitionKey)
- I need to see the latest results first, or I don't care. If this is earlier, your RowKey strategy should be something like
(DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks).ToString("d19") .
In addition, since PartitionKey is a string value, you may need to convert the int value to a string value with some preliminary signature “0” so that all your identifiers appear in order, otherwise you will get 1, 10, 11, .., 19, 2, ... etc.
As far as I know, Windows Azure only splits data into PartitionKey , not RowKey . Inside the section, RowKey serves as a unique key. Windows Azure will try to store data with the same PartitionKey in the same node, but since each node is a physical device (and therefore has a size limit), data can flow to another node.
You might want to read this blog post from the Windows Azure Storage Group: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows -azure-tables.aspx .
UPDATE Based on your comments below and some information above, let's try and do the math. This is based on the latest scalability goals published here: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/04/windows-azure-s-flat-network-storage-and-2012-scalability- targets.aspx . The documentation states that:
A single table is a table partition — a table partition — all objects in a table with the same partition key value, and usually tables have many partitions. Throughput target for one table partition:
- Up to 2000 objects per second
- Note that this is for a single partition, not a single table. Therefore, a good split table can process up to 20,000 units per second, which is the overall goal of the account above.
Now you mentioned that you have 10 - 20 different metric points, and for each metric point you will record a maximum of 1 record per minute, which means that you write a maximum of 20 objects / minute / table, which is good for the purpose of scaling 2000 units in give me a sec.
Now the question remains from reading. Assuming that the user will read data for 24 hours (i.e. 24 * 60 = 1440 points) for each section. Now, assuming that the user receives data for all 20 indicators in 1 day, each user (thus, each table) will retrieve a maximum of 28,800 data points. The question that remains for you, I think, is how many queries like this can you get per second to match this threshold. If you could somehow extrapolate this information, I think you can come to the conclusion about the scalability of your architecture.
I would also recommend watching this video: http://channel9.msdn.com/Events/Build/2012/4-004 .
Hope this helps.