So the scenario is as follows:
I have multiple instances of a web service that writes blob data to Azure Storage. I need to be able to group drops into a container (or virtual directory) depending on when it was received. From time to time (every day in the worst case), the old drops will be treated and then removed.
I have two options:
Option 1
I make one container called "blobs" (for example), and then save all the blogs in that container. Each blob will use a directory style name with the directory name being the time it was received (for example, "hr0min0 / data.bin", "hr0min0 / data2.bin", "hr0min30 / data3.bin", "hr1min45 / data.bin" " , ..., "hr23min0 / dataN.bin, etc. - a new directory every X minutes.) The thing that processes these blobs first processes hr0min0 blobs, then hr0minX, etc. (And the drops are still recorded when processing).
Option 2
I have many containers, each of which has a name based on the arrival time (so first there will be a container named blobs_hr0min0, then blobs_hr0minX, etc.), and all the drops in the container are those drops that arrived at the specified time. The thing handling these blogs will process one container at a time.
So my question is: which option is better? Does option 2 give me better parallelization (since containers can be on different servers) or option 1 is better because many containers can cause other unknown problems?
azure azure-storage azure-storage-blobs
encee Nov 16 2018-11-11T00: 00Z
source share