Urllib2 HTTP Error 429

So, I have a list of sub-reddits, and I use urllib to open them. When I go through them eventually urllib fails:

urllib2.HTTPError: HTTP Error 429: Unknown 

After doing some research, I found that reddit limits the number of requests to its servers by IP:

Make no more than one request every two seconds. There is some permission for outbursts of requests, but they remain reasonable. In general, keep it no more than 30 requests per minute.

So I decided to use time.sleep() to limit my queries to one page every 10 seconds. It ends with the same success.

The above quote is taken from the reddit API . I do not use the reddit API. At the moment, I’m thinking about two things. Either this limit applies only to the reddit API, or urllib also has a limit.

Does anyone know which of these two things? Or how can I solve this problem?

+8
python reddit urllib2
source share
2 answers

From https://github.com/reddit/reddit/wiki/API :

Many default user agents (such as "Python / urllib" or "Java") are severely limited to encourage unique and descriptive user agent strings.

This also applies to regular queries. When you make a request, you need to specify your own user agent header.

 #TODO: change user agent string hdr = { 'User-Agent' : 'super happy flair bot by /u/spladug' } req = urllib2.Request(url, headers=hdr) html = urllib2.urlopen(req).read() 

However, this will create a new connection for each request. I suggest using another library capable of reusing connections, httplib or Request , for example. This will reduce server load and speed up requests:

 import httplib import time lst = """ science scifi """ hdr= { 'User-Agent' : 'super happy flair bot by /u/spladug' } conn = httplib.HTTPConnection('www.reddit.com') for name in lst.split(): conn.request('GET', '/r/'+name, headers=hdr) print conn.getresponse().read() time.sleep(2) conn.close() 
+16
source

reddit fulfills the speed limit on demand (and not the connection as suggested by Anonymous Coward) for both IP addresses and user agents. The problem you are facing is that everyone trying to access reddit using urllib2 will be limited in speed as a single user.

The solution is to install a user agent to which you can find the answer in this question .

Alternatively, refuse to write your own code to bypass reddit and use PRAW instead . It supports almost all the functions of the reddit API, and you do not have to worry about following any API rules, since it takes care of this for you.

+4
source

All Articles