Non-blocking read / journal from the http stream

Question

Non-blocking read / journal from the http stream

I have a client that connects to an HTTP stream and writes text data that it consumes.

I am sending a HTTP GET request to a streaming server ... The server responds and publishes data continuously ... It will either publish text or send a ping (text) message regularly ... and will never close the connection.

I need to read and write data that it consumes in a non-blocking way.

I am doing something like this:

import urllib2 req = urllib2.urlopen(url) for dat in req: with open('out.txt', 'a') as f: f.write(dat)

My questions:
Will it ever block when the flow is continuous?
how much data is read in each fragment and can it be specified / configured?
is this the best way to read / write http stream?

+6

python http logging urllib2

Corey goldberg Oct 12 '09 at 21:59

source share

4 answers

Hey, these are three questions in one !; -)

Sometimes it can be blocked - even if your server generates data quite quickly, bottlenecks in the network can theoretically block your readings.

Reading URL data using "for dat in req" means reading a string at a time - not very useful if you are reading binary data such as an image. You get better control if you use

 chunk = req.read(size)

which can, of course, be blocked.

Whether this will be the best way depends on the features not available in your question. For example, if you need to run without any blocking calls, you will need to consider a structure such as Twisted . If you don’t want to block to hold you and don’t want to use Twisted (which is a completely new paradigm compared to the blocking way of doing things), then you can expand the stream to read and write the file while your main stream continues have fun:

 def func(req): #code the read from URL stream and write to file here ... t = threading.Thread(target=func) t.start() # will execute func in a separate thread ... t.join() # will wait for spawned thread to die

Obviously, I missed error checking / exception handling, etc., but hopefully this is enough to give you an image.

+6

Vinay sajip Oct 12 '09 at 10:19

source share

Another option is to use the socket module directly. Establish a connection, send an HTTP request, set the socket to non-blocking mode, and then read the data using socket.recv() exception handling "Resource temporarily unavailable" (which means there is nothing to read). A very crude example:

 import socket, time BUFSIZE = 1024 s = socket.socket() s.connect(('localhost', 1234)) s.send('GET /path HTTP/1.0\n\n') s.setblocking(False) running = True while running: try: print "Attempting to read from socket..." while True: data = s.recv(BUFSIZE) if len(data) == 0: # remote end closed print "Remote end closed" running = False break print "Received %d bytes: %r" % (len(data), data) except socket.error, e: if e[0] != 11: # Resource temporarily unavailable print e raise # perform other program tasks print "Sleeping..." time.sleep(1)

However, urllib.urlopen() has some advantages if redirecting a web server requires basic URL-based authentication, etc. You can use the select module, which will tell you when there is data to read.

+3

mhawke Oct 14 '09 at 6:13

source share

Yes, when you catch up with the server, it will block until the server issues more data

Each date will consist of one line, including a new line at the end

twisted is a good option

I would change the place in your example too, do you really want to open and close the file for every line that arrives?

+1

John la rooy Oct 12 '09 at 10:52

source share

Alex martelli · Accepted Answer · 2009-10-13T03:34:29+0000

You are using a too high-level interface to have good control over issues such as blocking and buffer block size. If you do not agree to completely switch to the asynchronous interface (in this case, twisted , already proposed, it is hard to beat!), Why not httplib , which is still in the standard library? The HTTPResponse .read(amount) instance method is more likely to block no more than the need to read amount bytes than the similar method for the object returned by urlopen (although, admittedly, there are no documentary specifications about this on any module, hmmm ...) .

Non-blocking read / journal from the http stream

More articles: