Storage transaction limits in Azure

I run performance tests with ATS and it is a little strange when you use multiple virtual machines against the same table / storage account.

The entire pipeline is not blocked (await / async) and uses TPL for parallel and parallel execution.

First of all, it is very strange that with this setting I get only 1200 attachments. This works on an L VM field, i.e. 4 cores + 800 Mbps.

I insert 100,000 lines with a unique PC and a unique PK, which the final distribution should use.

An even greater deterministic behavior is as follows.

When I start 1 VM, I get about 1200 inserts per second. When I start 3 virtual machines, I get about 730 for each insert per second.

Its pretty humor to read a blog post where they set their goals. https://azure.microsoft.com/en-gb/blog/windows-azures-flat-network-storage-and-2012-scalability-targets/

A single column - a table section - these are all entities in a table with the same section key value, and usually there are many sections in tables. Throughput target for one table partition:

Up to 2000 objects per second

Note that this is for a single partition, not a single table. Therefore, a table with a good partitioning can process up to 20,000 units per second, which is the general purpose of the account described above.

What should I do to use 20 thousand per second and how can I execute more than 1.2 thousand per VM?

-

Update:

I also tried using 3 storage accounts for each individual node and still getting performance / throttling behavior. Which I cannot find a logical reason.

-

Update 2:

I have optimized the code further, and now I can execute about 1550.

-

Update 3:

I also tried in the West USA. Performance is worse there. About 33% lower.

-

Update 4:

I tried to execute code from an XL device. Which is 8 cores instead of 4 and doubles the memory and bandwidth and received a 2% increase in performance, so this problem is not on my side.

+6
source share
4 answers

A few comments:

  • You mentioned that you use the unique PK / RK to get the maximum distribution, but you should keep in mind that balancing the PC is not immediate. When you first create a table, the entire table will be served by 1 partition server. Therefore, if you make inserts through several different PCs, they will still move to one server partition and be a bottleneck in order to scale for one partition. The partition wizard will start sharing your partitions between several partition servers after it has determined the partition servers. During your 2-minute test, you will not see the benefits of multiple partion servers or PCs. The bandwidth in the article is aimed at a well-distributed PC circuit with frequently received data, as a result of which the data should be shared between several partition servers.

  • The size of your virtual machine is not a problem; you are not blocking the CPU, Memory or Bandwidth. You can achieve full storage performance from a small VM.

  • Check out http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx . I just did a quick test using this tool with VMWare WebRole in the same data center as my account, and I received from one instance of the tool in one virtual machine, ~ 2800 pieces per second for download and ~ 7300 points per second download. It uses 1024 byte objects, 10 streams and 100 batch size. I don’t know how effective this tool is, or if it disables the Nagles algorithm, because I was not able to get excellent results (I got ~ 1000 / s) using lot size 1, but at least with lot size 100, this shows that you can reach tall items / second. This was done in the West of the USA.

  • Are you using repository client library 1.7 (Microsoft.Azure.StorageClient.dll) or 2.0 (Microsoft.Azure.Storage.dll)? Library 2.0 has some performance improvements and should give better results.

+4
source

Are the calculation instances and the storage account in the same affinity group? Affinity groups provide optimal network proximity between services and result in lower latency at the network level.

You can find the affinity group configuration on the network tab.

0
source

I suspect this may be related to TCP Nagle. See this MSDN article and this blog post .

In essence, TCP Nagle is a protocol-level optimization that performs small requests. Since you send a lot of small requests, this can adversely affect your performance.

You can disable TCP Nagle by executing this code at application startup

ServicePointManager.UseNagleAlgorithm = false; 
0
source

I would be inclined to believe that maximum throughput for optimized load. For example, I'm sure you can achieve better performance with batch requests than the individual requests you are currently doing. And, of course, if you use the GUID for your PC, you cannot run the package in the current test.

So, what if you changed your test to batch insert objects in groups of 100 (maximum per batch) while still using the GUID, but for which 100 objects will have the same PK?

0
source

All Articles