Python Scrapy does not always download data from a website

Question

Python Scrapy does not always download data from a website

Using Scrapy for html analysis. My question is why sometimes my answer is back to what I want, but sometimes it does not return an answer. It's my fault? Here's what the parsing function looks like.

class AmazonSpider(BaseSpider):
    name = "amazon"
    allowed_domains = ["amazon.org"]
    start_urls = [
       "http://www.amazon.com/s?rh=n%3A283155%2Cp_n_feature_browse-bin%3A2656020011"
   ]

def parse(self, response):
            sel = Selector(response)
            sites = sel.xpath('//div[contains(@class, "result")]')
            items = []
            titles = {'titles': sites[0].xpath('//a[@class="title"]/text()').extract()}
            for title in titles['titles']:
                item = AmazonScrapyItem()
                item['title'] = title
                items.append(item)
            return items

+3

python scrapy response request

Krasimir Nov 29 '13 at 15:55

source share

1 answer

Gustavo Bezerra · Answer 1 · 2014-01-31T11:57:00+0000

I believe that you simply do not use the most appropriate XPath expression.

Amazon HTML looks messy, not very uniform, and therefore not very easy to parse. But after some experiments, I was able to extract all 12 names from several search results with the following function parse:

def parse(self, response):
    sel = Selector(response)
    p = sel.xpath('//div[@class="data"]/h3/a')
    titles = p.xpath('span/text()').extract() + p.xpath('text()').extract()
    items = []
    for title in titles:
        item = AmazonScrapyItem()
        item['title'] = title
        items.append(item)
    return items

, , , .

Python Scrapy does not always download data from a website

More articles: