Directory Level Lists in Amazon S3

Question

Directory Level Lists in Amazon S3

I store two million files in an Amazon S3 bucket. Below is the root directory (l1), a list of directories under l1, and then each directory contains files. So my bucket will look something like this:

l1/a1/file1-1.jpg l1/a1/file1-2.jpg l1/a1/... another 500 files l1/a2/file2-1.jpg l1/a2/file2-2.jpg l1/a2/... another 500 files .... l1/a5000/file5000-1.jpg

I would like to list the second level entries as quickly as possible, so I would like to get a1, a2, a5000. I do not want to list all the keys, it will take a lot more time.

I am open to the direct use of AWS api, however I have played with the right_aws gem in ruby so far http://rdoc.info/projects/rightscale/right_aws p>

There are at least two APIs in this stone, I tried to use bucket.keys () in the S3 module and incrementally_list_bucket () in the S3Interface module. For example, I can set the prefix and separator to display all l1 / a1 / *, but I cannot figure out how to list only the first level in l1. There is an entry in the hash: common_prefixes, returned by incrementally_list_bucket (), but it is not populated in my test sample.

Is this operation possible using the S3 API?

Thanks!

+4

ruby amazon-s3 folders

Marius seritan Aug 6 '09 at 15:30

source share

2 answers

dubek · Answer 1 · 2009-08-19T08:26:17+0000

right_aws allows right_aws to do this as part of your S3Interface base class, but you can create your own method for easier (and more enjoyable) use. Put this at the top of your code:

 module RightAws class S3 class Bucket def common_prefixes(prefix, delimiter = '/') common_prefixes = [] @s3.interface.incrementally_list_bucket(@name, { 'prefix' => prefix, 'delimiter' => delimiter }) do |thislist| common_prefixes += thislist[:common_prefixes] end common_prefixes end end end end

This adds the common_prefixes method to the RightAws::S3::Bucket class. Now, instead of calling mybucket.keys to retrieve the list of keys in your bucket, you can use mybucket.common_prefixes to get an array of common prefixes. In your case:

 mybucket.common_prefixes("l1/") # => ["l1/a1", "l1/a2", ... "l1/a5000"]

I have to say that I tested it with only a few common prefixes; You should verify that this works with over 1000 common prefixes.

Eric Walsh · Answer 2 · 2017-02-09T01:33:22+0000

This thread is pretty old, but I recently ran into this problem and wanted to approve my 2cents ...

These are complete problems (it seems) to cleanly list folders given the path in the S3 bucket. Most modern gemstone shells around the S3 API (official representative of AWS-SDK, S3) do not correctly process the returned object (in particular, CommonPrefixes), so it is difficult to return a list of folders (delimited nightmares).

Here is a quick solution for those who use S3 stone ... Sorry that this is not one size fits all, but this is the best I wanted to do.

https://github.com/qoobaa/s3/issues/61

Code snippet:

 module S3 class Bucket # this method recurses if the response coming back # from S3 includes a truncation flag (IsTruncated == 'true') # then parses the combined response(s) XML body # for CommonPrefixes/Prefix AKA directories def directory_list(options = {}, responses = []) options = {:delimiter => "/"}.merge(options) response = bucket_request(:get, :params => options) if is_truncated?(response.body) directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body) else parse_xml_array(responses + [response.body], options) end end private def parse_xml_array(xml_array, options = {}, clean_path = true) names = [] xml_array.each do |xml| rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e| if clean_path names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '') else names << e.text end end end names end def next_marker(xml) marker = nil rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text } if marker.nil? raise StandardError else marker end end def is_truncated?(xml) is_truncated = nil rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text } is_truncated == 'true' end end end

Directory Level Lists in Amazon S3

More articles: