Azure table performance thanks to multi-parameter reading

Short version. Can we read from dozens or hundreds of table partitions in a multithreaded way to improve performance by orders of magnitude?

Long version: We are working on a system that stores millions of rows in Azure table storage. We divide the data into small sections, each of which contains about 500 records, which represents the daily value of the data for a unit.

Since Azure does not have a "sum" function to pull data for a year, we either have to use some preliminary caching, or summarize the data ourselves in the Azure role or working role.

Assuming the following: - Reading a section does not affect the performance of another - Reading a section has a bottleneck based on network speed and server search

Then we can assume that if we wanted to quickly summarize a lot of data on the fly (1 year, 365 partitions), we could use a massive parallel algorithm, and it would almost perfectly scale to the number of threads. For example, we could use parallel .NET extensions with 50+ threads and get a HUGE performance boost.

We are working on setting up some experiments, but I wanted to see if this was done before. Since the .NET side is mostly idle when working with high latency, this seems ideal for multithreading.

+7
azure parallel-extensions
source share
1 answer

Limits on the number of transactions that can be performed against the storage account and a specific partition or storage server for a certain period of time (somewhere around 500 rivers / s). Thus, in this sense, there is a reasonable limit to the number of requests that you could execute in parallel (before it begins to look like a DoS attack).

In addition, in the implementation, I will be afraid of simultaneous connection restrictions imposed on the client, for example, System.Net.ServicePointManager . I'm not sure that the Azure Storage client is subject to these restrictions; they may require adjustment.

+4
source share

All Articles