Short version. Can we read from dozens or hundreds of table partitions in a multithreaded way to improve performance by orders of magnitude?
Long version: We are working on a system that stores millions of rows in Azure table storage. We divide the data into small sections, each of which contains about 500 records, which represents the daily value of the data for a unit.
Since Azure does not have a "sum" function to pull data for a year, we either have to use some preliminary caching, or summarize the data ourselves in the Azure role or working role.
Assuming the following: - Reading a section does not affect the performance of another - Reading a section has a bottleneck based on network speed and server search
Then we can assume that if we wanted to quickly summarize a lot of data on the fly (1 year, 365 partitions), we could use a massive parallel algorithm, and it would almost perfectly scale to the number of threads. For example, we could use parallel .NET extensions with 50+ threads and get a HUGE performance boost.
We are working on setting up some experiments, but I wanted to see if this was done before. Since the .NET side is mostly idle when working with high latency, this seems ideal for multithreading.
Jason young
source share