Read file line by line from S3 with boto?

I have a csv file in S3, and I'm trying to read the title bar to get the size (these files are created by our users so that they can be of almost any size). Is there any way to do this with boto? I thought maybe I could write python BufferedReader, but I can't figure out how to open the stream from the S3 key. Any suggestions would be great. Thanks!

+8
python amazon-s3 boto
source share
4 answers

Boto seems to have a read() function that can do this. Here is the code that works for me:

 >>> import boto >>> from boto.s3.key import Key >>> conn = boto.connect_s3('ap-southeast-2') >>> bucket = conn.get_bucket('bucket-name') >>> k = Key(bucket) >>> k.key = 'filename.txt' >>> k.open() >>> k.read(10) 'This text ' 

The read(n) call returns the next n bytes from the object.

Of course, this will not automatically return the โ€œtitle barโ€, but you can call it with a sufficiently large number to return the title bar at least.

+7
source share

You can find https://pypi.python.org/pypi/smart_open useful for your task.

From the documentation:

 for line in smart_open.smart_open('s3://mybucket/mykey.txt'): print line 
+14
source share

Here's a solution that actually passes data line by line:

 from io import TextIOWrapper from gzip import GzipFile ... # get StreamingBody from botocore.response response = s3.get_object(Bucket=bucket, Key=key) # if gzipped gzipped = GzipFile(None, 'rb', fileobj=response['Body']) data = TextIOWrapper(gzipped) for line in data: # process line 
+1
source share

With boto3, you can access the raw stream and read line by line. Just note that the raw thread is private property for any reason.

 s3 = boto3.resource('s3', aws_access_key_id='xxx', aws_secret_access_key='xxx') obj = s3.Object('bucket name', 'file key') obj.get()['Body']._raw_stream.readline() # line 1 obj.get()['Body']._raw_stream.readline() # line 2 obj.get()['Body']._raw_stream.readline() # line 3... 
0
source share

All Articles