Problem loading urllib2 mobile site

I am trying to get some data from http://m.finnkino.fi/events/now_showing , but at the moment I am failing because I am not even able to load the page source using python. I am currently using the following code:

req = urllib2.urlopen(URL,None,2.5) page = req.read() print page 

Here is the trace for a timeout error:

 Traceback (most recent call last): File "user/src/finnkinoParser.py", line 26, in <module> main() File "user/src/finnkinoParser.py", line 13, in main getNowPlayingMovies() File "user/src/finnkinoParser.py", line 17, in getNowPlayingMovies req = urllib2.urlopen(baseURL,None,2.5) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 383, in open response = self._open(req, data) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 401, in _open '_open', req) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 361, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1130, in http_open return self.do_open(httplib.HTTPConnection, req) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py", line 1105, in do_open raise URLError(err) urllib2.URLError: <urlopen error timed out> 

If I go to the URL with my browser, it works fine. So can someone tell me what makes this site a lot different, so urllib2 cannot load the page. I believe that this is due to the fact that the site is aimed at users of mobile devices. Urllib2 works great with "regular" sites. Are there any other sites that the main urlopen (URL) does not work for?

thanks for the help

+4
source share
1 answer

The following snippet works fine.

 import httplib headers = {"User-Agent": "Mozilla/5.0"} conn = httplib.HTTPConnection("m.finnkino.fi") conn.request("GET", "/events/now_showing", "", headers) response = conn.getresponse() print response.status, response.reason data = response.read() print data conn.close() 

It seems their server has checked several requests. After checking several times, here is the output:

  • The http protocol must be HTTP / 1.1.
  • If request headers have Connection prop, its value should be saved. Request Headers
  • must have a User-Agent attribute, regardless of its value.

While in urllib2 Connection prop in HTTPHandler was installed by default (L1127 in urllib2.py). you can use urlgrabber or another HTTP handler that supports HTTP / 1.1 and remains operational.

+3
source

All Articles