Httplib.InvalidURL: odd port:

I am trying to make scripts that check if many URLs exist:

import httplib with open('urls.txt') as urls: for url in urls: connection = httplib.HTTPConnection(url) connection.request("GET") response = connection.getresponse() if response.status == 200: print '[{}]: '.format(url), "Up!" 

But I got this error:

 Traceback (most recent call last): File "test.py", line 5, in <module> connection = httplib.HTTPConnection(url) File "/usr/lib/python2.7/httplib.py", line 693, in __init__ self._set_hostport(host, port) File "/usr/lib/python2.7/httplib.py", line 721, in _set_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: '//globo.com/galeria/amazonas/a.html 

What's wrong?

+7
source share
2 answers

httplib.HttpConnection accepts the host and port remote URL in its constructor, and not the entire URL.

For your use case it's easier to use urllib2.urlopen .

 import urllib2 with open('urls.txt') as urls: for url in urls: try: r = urllib2.urlopen(url) except urllib2.URLError as e: r = e if r.code in (200, 401): print '[{}]: '.format(url), "Up!" elif r.code == 404: print '[{}]: '.format(url), "Not Found!" 
+6
source

It might be a simple solution, here

 connection = httplib.HTTPConnection(url) 

you are using httpconnection , so you do not need to provide the url , http://OSMQuote.com , but instead you need to specify OSMQuote.com .

In short, remove http:// and https:// from your URL, because httplib considers : as the port number, and the port number must be numeric,

Hope this helps!

+18
source

All Articles