Amazon S3 merges small files

Is there a way to combine small files smaller than 5 MB in size on Amazon S3. Downloading multiple parts is not suitable due to small files.

This is not an effective solution to delete all these files and perform concatenation.

So, can someone tell me some APIs to do this?

+8
source share
3 answers

Amazon S3 does not provide a concatenation feature. This is primarily an object storage service.

You will need some kind of process that loads objects, merges them, and then loads them again. The most efficient way to do this is to load parallel objects to take full advantage of the available bandwidth. However, this is more difficult for the code.

I would recommend doing the processing in the cloud, so as not to download objects over the Internet. Running this on an Amazon EC2 or AWS Lambda would be more efficient and less expensive.

+10
source

Edit: I did not see the 5 MB requirement. This method will not work because of this requirement.

From https://ruby.awsblog.com/post/Tx2JE2CXGQGQ6A4/Efficient-Amazon-S3-Object-Concatenation-Using-the-AWS-SDK-for-Ruby :

While you can upload and reload data to S3 through an EC2 instance, a more efficient approach would be to instruct S3 to make an internal copy using the new copy_part API operation, which was introduced in the Ruby SDK in version 1.10.0.

code:

require 'rubygems' require 'aws-sdk' s3 = AWS::S3.new() mybucket = s3.buckets['my-multipart'] # First, let start the Multipart Upload obj_aggregate = mybucket.objects['aggregate'].multipart_upload # Then we will copy into the Multipart Upload all of the objects in a certain S3 directory. mybucket.objects.with_prefix('parts/').each do |source_object| # Skip the directory object unless (source_object.key == 'parts/') # Note that this section is thread-safe and could greatly benefit from parallel execution. obj_aggregate.copy_part(source_object.bucket.name + '/' + source_object.key) end end obj_completed = obj_aggregate.complete() # Generate a signed URL to enable a trusted browser to access the new object without authenticating. puts obj_completed.url_for(:read) 

Limitations (among others)

  • With the exception of the last part, the minimum part size is 5 MB.
  • A completed multi-tenant download is limited to a maximum size of 5 TB.
+1
source

An implementation based on a demo copy that will merge smaller files into large ones and delete these smaller files at the end. https://gist.github.com/azimbabu/d9dc9e05ee008875325472d598924df8

0
source

All Articles