How to prevent blacklisting while cleaning Amazon

Question

How to prevent blacklisting while cleaning Amazon

I am trying to clean Amazon using Scrapy. but i have this error

DEBUG: Retrying <GET http://www.amazon.fr/Amuses-bouche-Peuvent-b%C3%A9n%C3%A9ficier-dAmazon-Premium-Epicerie/s?ie=UTF8&page=1&rh=n%3A6356734031%2Cp_76%3A437878031> (failed 1 times): 503 Service Unavailable

I think this is because = Amazon detects bots very well. How can I prevent this?

I used time.sleep(6) before each request.

I do not want to use their API.

I tried to use Tor and Polipo

+4

amazon web-crawler web-scraping scrapy scrapy-spider

parik May 06 '16 at 16:42

source share

2 answers

It may also be of interest to you, a basic screening setup with two means, one for a random IP address and a second for random user agents.

0

BB04Deng May 07 '16 at 19:26

source share

alecxe · Accepted Answer · 2016-05-06T16:44:14+0000

You must be very careful with Amazon and abide by Amazon's Terms of Service and Web Cleaning Policies.

Amazon is pretty good at banning bot IPs. You will have to set DOWNLOAD_DELAY and CONCURRENT_REQUESTS to visit the website less often and be a good citizen. In addition, you need to rotate the IP addresses (for example, you can see crawlera ) and user agents .

How to prevent blacklisting while cleaning Amazon

More articles: