No supplied schema or other errors when using request.get ()

I am learning Python following the principle of Automation the Boring Stuff. It is assumed that this program will go to http://xkcd.com/ and download all the images for viewing offline.

I am on version 2.7 and Mac.

For some reason, I get error messages like "No schema specified" and errors using request.get () itself.

Here is my code:

# Saves the XKCD comic page for offline read import requests, os, bs4, shutil url = 'http://xkcd.com/' if os.path.isdir('xkcd') == True: # If xkcd folder already exists shutil.rmtree('xkcd') # delete it else: # otherwise os.makedirs('xkcd') # Creates xkcd foulder. while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop # Download the page print 'Downloading %s page...' % url res = requests.get(url) # Get the page res.raise_for_status() # Check for errors soup = bs4.BeautifulSoup(res.text) # Dowload the page # Find the URL of the comic image comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem if comicElem == []: # if the list is empty print 'Couldn\'t find the image!' else: comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to # comicUrl # Download the image print 'Downloading the %s image...' % (comicUrl) res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() res.raise_for_status() # Check for errors # Save image to ./xkcd imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb') for chunk in res.iter_content(10000): imageFile.write(chunk) imageFile.close() # Get the Prev btn URL prevLink = soup.select('a[rel="prev"]')[0] # The Previous button is first <a rel="prev" href="/1535/" accesskey="p">&lt; Prev</a> url = 'http://xkcd.com/' + prevLink.get('href') # adds /1535/ to http://xkcd.com/ print 'Done!' 

Here are the errors:

 Traceback (most recent call last): File "/Users/XKCD.py", line 30, in <module> res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get return request('get', url, params=params, **kwargs) File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request prep = self.prepare_request(req) File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare self.prepare_url(url, params) File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url to_native_string(url, 'utf8'))) requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png? 

The fact is that I read the section in the book about the program many times, read the Requests document, and also looked at other questions here. My syntax looks right.

Thank you for your help!

Edit:

This did not work:

 comicUrl = ("http:"+comicElem[0].get('src')) 

I thought adding http: before would get rid of the missing schema error.

+9
python request
source share
4 answers

change your comicUrl to

 comicUrl = comicElem[0].get('src').strip("http://") comicUrl="http://"+comicUrl if 'xkcd' not in comicUrl: comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:] print "comic url",comicUrl 
+6
source share

No scheme means you have not provided http:// or https:// power, and this will do the trick.

Edit: look at this URL bar !:

URL '//imgs.xkcd.com/comics/the_martian.png':

+8
source share

Explanation:

Several XKCD pages have special content that is not a simple image file. Good; you can just skip them. If your selector does not find any elements, then soup.select ('# comic img') will return an empty list.

Work code:

 import requests,os,bs4,shutil url='http://xkcd.com' #making new folder if os.path.isdir('xkcd') == True: shutil.rmtree('xkcd') else: os.makedirs('xkcd') #scrapiing information while not url.endswith('#'): print('Downloading Page %s.....' %(url)) res = requests.get(url) #getting page res.raise_for_status() soup = bs4.BeautifulSoup(res.text) comicElem = soup.select('#comic img') #getting img tag under comic divison if comicElem == []: #if not found print error print('could not find comic image') else: try: comicUrl = 'http:' + comicElem[0].get('src') #getting comic url and then downloading its image print('Downloading image %s.....' %(comicUrl)) res = requests.get(comicUrl) res.raise_for_status() except requests.exceptions.MissingSchema: #skip if not a normal image file prev = soup.select('a[rel="prev"]')[0] url = 'http://xkcd.com' + prev.get('href') continue imageFile = open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb') #write downloaded image to hard disk for chunk in res.iter_content(10000): imageFile.write(chunk) imageFile.close() #get previous link and update url prev = soup.select('a[rel="prev"]')[0] url = "http://xkcd.com" + prev.get('href') print('Done...') 
+1
source share

Id just wanted to listen here that I had the same error and I used the recommended @Ajay answer above, but even after adding that I still get problems, immediately after loading the program the first image will stop and return this error

 ValueError: Unsupported or invalid CSS selector: "a[rel" 

this refers to one of the last lines in the program where the "Prev" button is used to go to the next image to load.

Anyway, going through the bs4 docs, I made a small change as follows, and now it looks fine:

 prevLink = soup.select('a[rel^="prev"]')[0] 

Someone might run into the same problem, so Id should add this comment.

0
source share

All Articles