Is it better to have many small containers for storing Azure containers (each with some drops) or one really large container with a bunch of drops?

Question

Is it better to have many small containers for storing Azure containers (each with some drops) or one really large container with a bunch of drops?

So the scenario is as follows:

I have multiple instances of a web service that writes blob data to Azure Storage. I need to be able to group drops into a container (or virtual directory) depending on when it was received. From time to time (every day in the worst case), the old drops will be treated and then removed.

I have two options:

Option 1

I make one container called "blobs" (for example), and then save all the blogs in that container. Each blob will use a directory style name with the directory name being the time it was received (for example, "hr0min0 / data.bin", "hr0min0 / data2.bin", "hr0min30 / data3.bin", "hr1min45 / data.bin" " , ..., "hr23min0 / dataN.bin, etc. - a new directory every X minutes.) The thing that processes these blobs first processes hr0min0 blobs, then hr0minX, etc. (And the drops are still recorded when processing).

Option 2

I have many containers, each of which has a name based on the arrival time (so first there will be a container named blobs_hr0min0, then blobs_hr0minX, etc.), and all the drops in the container are those drops that arrived at the specified time. The thing handling these blogs will process one container at a time.

So my question is: which option is better? Does option 2 give me better parallelization (since containers can be on different servers) or option 1 is better because many containers can cause other unknown problems?

+66

azure azure-storage azure-storage-blobs

encee Nov 16 2018-11-11T00:

source share

4 answers

Each one provided you with excellent answers about accessing the blocks directly. However, if you need to list drops in a container, you are likely to see better performance with a model with many containers. I just talked to a company that kept a huge amount of drops in one container. They often list objects in a container and then perform actions against a subset of these blocks. They see success as the time to get a complete list is growing.

This may not apply to your scenario, but it is something to consider ...

+54

David Makogon Nov 16 '11 at 23:40

source share

Theoretically, there should be no difference between lots of containers or fewer containers with more drops. Additional containers can be good as additional security boundaries (for example, for general anonymous access or various SAS signatures). Additional containers can also facilitate cleaning during cleaning (removal of one container or targeting per block). I tend to use more containers for these reasons (not for performance).

Theoretically, the effect of performance should not exist. The blob itself (full URL) is the partition key in Windows Azure (was long lasting). This is the smallest thing that will be balanced by the load from the partition server. That way, you could (and often) have two different drops in the same container, served by different servers.

Jeremy points out the difference in performance between ever fewer containers. I did not delve into these tests enough to explain why this might be so, but I would suspect other factors (for example, size, duration of the test, etc.) to explain any discrepancies.

+19

dunnry Nov 16 '11 at 10:11

source share

This is another factor. Price!

Currently, the list of operations and Create a container at the same price: $ 0.054 / 10,000 calls

The same price is actually for writing blob.

So as a last resort, you can pay a lot more if you create and delete many containers

remove for free

you can see the calculator here: https://azure.microsoft.com/en-us/pricing/calculator/

+2

Jiří Herník Oct 13 '17 at 9:46 on

source share

Eugenio Pace · Accepted Answer · 2011-11-16 22:10

I don’t think it really matters (in terms of scalability / parallelization), because the separation in the Win Azure droplet store is done at the blob level, and not in the container. The reasons for the distribution in different containers are more related to access control (for example, SAS) or the total storage size.

See here for more details: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx

(Scroll down to the "Sections" section).

Citation:

Blobs. Since the partition key is before the blob name, we can load balance access to various blocks on as many servers as possible to expand access to them. This allows containers to grow as large as you need (within the space for the storage account). The compromise is that we do not provide the ability to do atomic transactions in several blocks.

Is it better to have many small containers for storing Azure containers (each with some drops) or one really large container with a bunch of drops?

More articles: