Blocking best practice s3 object?

I have an S3 bucket containing quite a few S3 objects from which you can extract multiple EC2 instances (when scaling horizontally). Each EC2 will pull an object one at a time, process it, and move it to another bucket.

Currently, to make sure that the same object is not being processed by multiple EC2 instances, my Java application renames it with a “locked” extension added to its S3 object key. The problem is that “renaming” actually makes “movement”. Thus, large files in the S3 bucket can take up to several minutes to complete their “renaming”, as a result of which the locking process is ineffective.

Does anyone have a best practice to accomplish what I'm trying to do?

I have considered using SQS, but this "solution" has its own set of problems (the order is not guaranteed, the ability to deliver messages more than once and more than one EC2 receiving the same message)

I am wondering if setting a “blocked” header will be a faster “blocking” process.

+6
source share
1 answer

the order is not guaranteed, the ability to deliver messages more than once and more than one EC2 receiving the same message

The probability of receiving the same message several times is low. It is just “possible,” but not very likely. If this is essentially just an annoyance, if in some cases you have to process the file more than once, then SQS seems like a reasonable option.

Otherwise, you will need an external mechanism.

Setting the “locked” object header has its own problem - when you overwrite the object with a copy of itself (what happens when metadata changes - a new copy of the object is created, with the same key), then you are subject to lines and arrows of possible consistency.

Q: What data consistency model does Amazon S3 use?

Amazon S3 buckets in all regions provide post-write consistency for EMPTY new objects and possible consistency for overwriting EMPs and DELETES.

https://aws.amazon.com/s3/faqs/

Updating metadata is "overwrite PUT ". Your new title may not be immediately visible, and if two or more workers set their own unique title (for example, x-amz-meta-locked: i-12345678), it is possible that a script like the following will be played (W1, W2 = Worker No. 1 and No. 2):

 W1: HEAD object (no lock header seen) W2: HEAD object (no lock header seen) W1: set header W2: set header W1: HEAD object (sees its own lock header) W2: HEAD object (sees its own lock header) 

The same or similar failure can occur with several different time permutations.

Objects cannot be effectively locked in an environment with a matching sequence like this.

+7
source

All Articles