Wikipedia allows URL download via Google App Engine?

I am writing a Python web application and I plan to use Wikipedia in it. When you tried to download the URL code, I was able to get both Google and Facebook (via the Google App Engine services), but when I tried to get wikipedia.org, I got an exception. Can anyone confirm that Wikipedia is not accepting these types of page requests? How does Wikipedia distinguish between me and the user?

Code snippet (this is Python!):

import os import urllib2 from google.appengine.ext.webapp import template class MainHandler(webapp.RequestHandler): def get(self): url = "http://wikipedia.org" try: result = urllib2.urlopen(url) except urllib2.URLError, e: result = 'ahh the sky is falling' template_values= { 'test':result, } path = os.path.join(os.path.dirname(__file__), 'index.html') self.response.out.write(template.render(path, template_values)) 
+4
source share
2 answers

urllib2 user agent is not allowed by default from Wikipedia, and this results in an HTTP 403 response.
You should change your application user agent like this:

 #Option 1 import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'MyUserAgent')] res= opener.open('http://whatsmyuseragent.com/') page = res.read() #Option 2 import urllib2 req = urllib2.Request('http://whatsmyuseragent.com/') req.add_header('User-agent', 'MyUserAgent') urllib2.urlopen(req) #Option 3 req = urllib2.Request("http://whatsmyuseragent.com/", headers={"User-agent": "MyUserAgent"}) urllib2.urlopen(req) 

Bonus link:
High Level Wikipedia Python Clients http://www.mediawiki.org/wiki/API:Client_code#Python

+5
source

You can configure your user agent on any line you want; It will be modified by App Engine to add the AppEngine-Google; (+http://code.google.com/appengine; appid: yourapp) AppEngine-Google; (+http://code.google.com/appengine; appid: yourapp) . In urllib2, you can set the user-agent header as follows:

 req = urllib2.Request("http://en.wikipedia.org/", headers={"User-Agent": "Foo"}) response = urllib2.urlopen(req) 
+1
source

All Articles