How to evaluate the performance of Windows Azure Table storage queries?

I would like to evaluate how storage queries are scaled in Windows Azure Table. To this end, I put together a simple test environment where I can increase the amount of data in my table and measure the query execution time. And depending on the time that I would like to define a cost function that can be used to evaluate the performance of future requests.

I rated the following queries:

  • Request with PartitionKey and RowKey
  • Request with PartitionKey and attribute
  • Request with PartitionKey and two RowKeys
  • Request with PartitionKey and two attributes

For the last two queries, I checked the following two patterns:

  • PartitionKey == "..." && (RowKey == "..." || RowKey == "...")
  • (PartitionKey == "..." && RowKey == "...") || (PartitionKey == "..." && RowKey == "...")

To minimize transmission delay, I ran a test using the Azure example. From the measurements, I see that

  • query 1 (not surprisingly, since the table is indexed based on these fields) is extremely fast, it takes about 10-15 ms if I have about 150,000 records in the table.
  • query 2 requires scanning partitions, so the execution time increases linearly with the stored data.
  • query 3.1 performs almost exactly like query 2. Thus, this query also runs with a full section scan, which seems a little strange to me.
  • query 4.1 is slightly more than twice as slow as query 3.1. Thus, it seems that it is evaluated using two sections.
  • and finally, queries 3.2 and 4.2 execute almost exactly 4 times slower than query 2.

Can you explain the interiors of the query / filter interpreter? Even if we agree that query 3.1 needs to scan partitions, query 4.1 can also be evaluated using the same logic (and at the same time). Query 3.2 and 4.2 seems a mystery to me. Any pointers to them?

Obviously, the whole point is that I would like to request individual elements in one query in order to minimize costs without losing performance. But it seems that using separate queries (with a parallel task library) for each element is the only real quick solution. What is an acceptable way to do this?

+4
source share
2 answers

With a query of type 3.2 and 4.2, a full section scan will be performed one at a time along with attributes. The request will not be executed in parallel, even if these sections are on two separate machines, and therefore you see such a long time during execution. This is because there is no query optimization with queries in Windows Azure. It is the responsibility of the code to write in such a way that they can work in parallel.

You are right if you want to improve performance, you could run a query in parallel using parallel task libraries to achieve better performance.

+2
source

Since the details of the internal implementation of the table storage are not public, if you want to evaluate the performance of future queries, I would suggest you check out http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to- get-most-out-of-windows-azure-tables.aspx for some best practices.

Best wishes,

Ming Xu.

+1
source

Source: https://habr.com/ru/post/1414213/


All Articles