Is there a Python module for transparently handling a file-like object as a buffer?

Question

Is there a Python module for transparently handling a file-like object as a buffer?

I am working on a pure Python parser where the input can range in size from kilobytes to gigabytes. Is there a module that wraps a file-like object and abstracts the explicit calls to .open () /. Seek () /. Read () /. Close () into a simple buffer object? You might think of this as the opposite of StringIO. I expect this to look something like this:

with FileLikeObjectBackedBuffer(urllib.urlopen("http://www.google.com")) as buf: header = buf[0:0x10] footer = buf[-0x10:]

Note that yesterday I asked > a similar quesetion and accepted the mmap file. Here I am specifically looking for a module that wraps a file-like object (for the sake of argument, for example, say that urllib returns).

Update I have come back to this question many times since I first asked it, and it turned out that urllib may not be the best example. This is a bit of a special case, as its stream interface. StringIO and bz2 expose a more traditional seek / read / close interface, and I often use them more often. So I wrote a module that wraps file objects in the form of buffers. You can check it out here .

+4

python

Willi Ballenthin Dec 24 '12 at 18:00

source share

1 answer

Jon clements · Accepted Answer · 2012-12-24T18:12:01+0000

Although urllib.urlopen returns an obj-like file, I don’t find it possible to do what you want without writing your own - for example, it does not support seek , but supports next , read , etc. And since you are dealing with the flow only forward, you will have to handle the jumps forward, restoring until you reach a certain point and caching any kind of rollback.

IMHO - you cannot effectively skip part of the network I / O stream (if you want to use the last byte, you still need to get all the previous bytes to get there - how you manage this storage is up to you).

I will be tempted to urlretrieve (or similar) the file and mmap as per your previous answer.

If your server can accept ranges (and the size of the response is also known from these derived blocks according to your example), then the possible work is to use http://en.wikipedia.org/wiki/Byte_serving (but I can’t say that I ever tried this).

Given an example, if you want only the first 16 and last 16 and don’t want to do something “too fantastic”:

 from string import ascii_lowercase from random import choice from StringIO import StringIO buf = ''.join(choice(ascii_lowercase) for _ in range(50)) print buf sio_buf = StringIO(buf) # make it a bit more like a stream object first16 = sio_buf.read(16) print first16 from collections import deque last16 = deque(iter(lambda: sio_buf.read(1), ''), 16) # read(1) may look bad but it buffered anyway - so... print ''.join(last16)

Output:

 gpsgvqsbixtwyakpgefrhntldsjqlmfvyzwjoykhsapcmvjmar gpsgvqsbixtwyakp wjoykhsapcmvjmar

Is there a Python module for transparently handling a file-like object as a buffer?

More articles: