How to prevent blacklisting while cleaning Amazon

I am trying to clean Amazon using Scrapy. but i have this error

DEBUG: Retrying <GET http://www.amazon.fr/Amuses-bouche-Peuvent-b%C3%A9n%C3%A9ficier-dAmazon-Premium-Epicerie/s?ie=UTF8&page=1&rh=n%3A6356734031%2Cp_76%3A437878031> (failed 1 times): 503 Service Unavailable 

I think this is because = Amazon detects bots very well. How can I prevent this?

I used time.sleep(6) before each request.

I do not want to use their API.

I tried to use Tor and Polipo

+4
source share
2 answers

You must be very careful with Amazon and abide by Amazon's Terms of Service and Web Cleaning Policies.

Amazon is pretty good at banning bot IPs. You will have to set DOWNLOAD_DELAY and CONCURRENT_REQUESTS to visit the website less often and be a good citizen. In addition, you need to rotate the IP addresses (for example, you can see crawlera ) and user agents .

+6
source

It may also be of interest to you, a basic screening setup with two means, one for a random IP address and a second for random user agents.

0
source

All Articles