Python: urllib / urllib2 / httplib confusion

I am trying to test the functionality of a web application by writing a Python login sequence, but I am having problems.

Here is what I need to do:

  • Do a POST with a few options and headers.
  • Follow the migration
  • Get the body of HTML.

Now I'm relatively new to python, but the two things I have tested so far have not worked. At first I used httplib, with putrequest () (passing parameters in the URL) and putheader (). This does not seem to be a redirect.

Then I tried urllib and urllib2, passing both the headers and parameters as dicts. This seems to return the login page, not the page I'm trying to login to, I think, due to the lack of cookies or something else.

Am I missing something simple?

Thank.

+52
python urllib2
Nov 19 '08 at 13:44
source share
8 answers

Focus on urllib2 for this, it works quite well. Do not mess with httplib , this is not a top level API.

What you notice is that urllib2 does not follow the redirect.

You need to add an instance of HTTPRedirectHandler that will catch and follow the redirects.

In addition, you might want to subclass the default HTTPRedirectHandler to collect information, which you then check as part of testing your device.

 cookie_handler= urllib2.HTTPCookieProcessor( self.cookies ) redirect_handler= HTTPRedirectHandler() opener = urllib2.build_opener(redirect_handler,cookie_handler) 

You can then use this opener object for POST and GET, properly handling redirects and cookies.

You might want to add your own subclass of HTTPHandler to record and register various error codes.

+31
Nov 19 '08 at 14:52
source

Here I take care of this problem.

 #!/usr/bin/env python import urllib import urllib2 class HttpBot: """an HttpBot represents one browser session, with cookies.""" def __init__(self): cookie_handler= urllib2.HTTPCookieProcessor() redirect_handler= urllib2.HTTPRedirectHandler() self._opener = urllib2.build_opener(redirect_handler, cookie_handler) def GET(self, url): return self._opener.open(url).read() def POST(self, url, parameters): return self._opener.open(url, urllib.urlencode(parameters)).read() if __name__ == "__main__": bot = HttpBot() ignored_html = bot.POST('https://example.com/authenticator', {'passwd':'foo'}) print bot.GET('https://example.com/interesting/content') ignored_html = bot.POST('https://example.com/deauthenticator',{}) 
+15
Jan 29 2018-11-11T00
source

@S. Lott, thanks. Your suggestion worked for me, with some changes. Here is how I did it.

 data = urllib.urlencode(params) url = host+page request = urllib2.Request(url, data, headers) response = urllib2.urlopen(request) cookies = CookieJar() cookies.extract_cookies(response,request) cookie_handler= urllib2.HTTPCookieProcessor( cookies ) redirect_handler= HTTPRedirectHandler() opener = urllib2.build_opener(redirect_handler,cookie_handler) response = opener.open(request) 
+13
Nov 19 '08 at 15:17
source

I should have done this recently. I only need classes from the standard library. Here is an excerpt from my code:

 from urllib import urlencode from urllib2 import urlopen, Request # encode my POST parameters for the login page login_qs = urlencode( [("username",USERNAME), ("password",PASSWORD)] ) # extract my session id by loading a page from the site set_cookie = urlopen(URL_BASE).headers.getheader("Set-Cookie") sess_id = set_cookie[set_cookie.index("=")+1:set_cookie.index(";")] # construct headers dictionary using the session id headers = {"Cookie": "session_id="+sess_id} # perform login and make sure it worked if "Announcements:" not in urlopen(Request(URL_BASE+"login",headers=headers), login_qs).read(): print "Didn't log in properly" exit(1) # here the function I used after this for loading pages def download(page=""): return urlopen(Request(URL_BASE+page, headers=headers)).read() # for example: print download(URL_BASE + "config") 
+11
Nov 19 '08 at 15:12
source

I would give a Mechanize ( http://wwwsearch.sourceforge.net/mechanize/ ) shot. It can handle your cookies / headers well.

+8
Nov 19 '08 at 14:19
source

Try twill , a simple language that allows users to view the web interface from the command line interface. With twill, you can navigate websites using forms, cookies, and most of the standard web features. Moreover, twill is written in Python and has a python API , for example:

 from twill import get_browser b = get_browser() b.go("http://www.python.org/") b.showforms() 
+6
Nov 19 '08 at 14:15
source

Besides the fact that you may not have enough cookies, there may be field (s) in a form that you do not send to the web server. The best way is to grab the actual POST from a web browser. You can use LiveHTTPHeaders or WireShark to track traffic and simulate the same behavior in a script.

0
Nov 19 '08 at 2:00
source

Funkload is a great tool for testing web applications. It wraps a web site to handle browser emulation, and then gives you functional and load functions.

0
Nov 19 '08 at 14:32
source



All Articles