I am writing a small Python script to capture images through Google images. I managed to bring the matter to the point that I have the URLs of the images I want in a convenient list. Now I just need to capture them ...
for each image url i do this:
print("Retrieving:{0}".format(sFinalImageURL)) sExt = sFinalImageURL.split('.')[-1] #u = urllib.request.urlopen(sFinalImageURL) try: u = urllib.request.urlopen(sFinalImageURL) except: print("error: cannot retrieve image") continue raw_data = u.read() print("read {0} bytes".format(len(raw_data))) u.close() global sImagesFolder try: f = open("{0}/{1}_{2}.{3}".format(sImagesFolder,sImage,i,sExt),'wb') f.write(raw_data) f.close() except: print("couldn't write to {0}/{1}_{2}.{3}".format(sImagesFolder,sImage,i,sExt)) print()
Here are the issues I am facing:
trying to open some of the urls gives me 403, although I can open the urls directly in my browser. So there is something in the header of the HTTP request that the image server does not like ... any ideas?
Here are some of the results:
Retrieving:http://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Timba%2B1.jpg/220px-Timba%2B1.jpg error: cannot retrieve image Retrieving:http://upload.wikimedia.org/wikipedia/commons/thumb/2/26/YellowLabradorLooking_new.jpg/260px-YellowLabradorLooking_new.jpg error: cannot retrieve image Retrieving:http://1.bp.blogspot.com/-7SsJ1n3RdoA/Tf07NOgD5nI/AAAAAAAAABo/tl8qLLIU01Y/s1600/english-shepherd-dog-0003.jpg read 11123 bytes Retrieving:http://completedogfood.net/wp-content/uploads/2010/07/complete-dog-food.bmp read 419630 bytes
source share