Simultaneous recording s3

I think I have a problem with writing s3 at the same time. Two (or more) processes write almost the same contents to the same place s3 simultaneously. I would like to define concurrency rules that determine how this situation will play.

By design, all processes except one will be killed when writing to s3. (I said that they write “almost” the same content, because all but one of the processes die. If all processes are allowed to live, they will eventually write the exact exact content.)

My theory is that the process being killed leaves the incomplete file on s3, and the other file (which was supposedly written completely) is not selected as the one that lives on s3. I would like to prove or disprove this theory. (I'm trying to find out if the problems are caused by concurrency problems while writing to s3 or some other time).

From the FAQ at http://aws.amazon.com/s3/faqs/ :

Q: What data consistency model does Amazon S3 use?

Rugs Amazon S3 in the West USA (Oregon), West USA (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney) and South America (San -Paulu) ensure read and write consistency for EMPTY new objects and possible consistency for overwriting IMP and DELETES. Amazon S3 buckets in the standard US region offer consistent consistency.

I use the standard US region.

  • What does this answer say about parallel writing? I think I understand the difference between “consistency after recording” and “possible consistency”, but only in the context of what is visible when reading an object immediately after recording.
  • Is it possible for the killed process to “win” and therefore receive an incomplete file on s3? Or does s3 somehow guarantee that the file will only be installed on s3 if the entire PUT operation is complete?
  • How does s3 decide which file wins? This is the real question here.
+4
source share
1 answer

I do not think that statements of consistency in this article of frequently asked questions say what will happen during parallel recording on the same key.

However, in S3 it is not possible to get an incomplete file: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html says

Amazon S3 never adds partial objects; if you get a success response, Amazon S3 added the entire object to the bucket.

This means that only a file that will be fully downloaded will exist on the specified key, but I believe that such a parallel record can tickle some error conditions that do not lead to the file not being successfully downloaded. I would do some tests to be sure; You can also try using version control of objects while you are on it, and see if it works differently.

+6
source

All Articles