Scrapy Python Craigslist Scraper

Question

Scrapy Python Craigslist Scraper

I am trying to clear Craigslist ads using Scrapy to extract items that are for sale.

I can extract the date, post the header and send the url , but I am having trouble retrieving the price .

For some reason, the current code extracts all of the prices, but when I delete // before the price range is searched, the price field is returned as empty.

Can someone please review the code below and help me out?

from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from craigslist_sample.items import CraigslistSampleItem class MySpider(BaseSpider): name = "craig" allowed_domains = ["craigslist.org"] start_urls = ["http://longisland.craigslist.org/search/sss?sort=date&query=raptor%20660&srchType=T"] def parse(self, response): hxs = HtmlXPathSelector(response) titles = hxs.select("//p") items = [] for titles in titles: item = CraigslistSampleItem() item['date'] = titles.select('span[@class="itemdate"]/text()').extract() item ["title"] = titles.select("a/text()").extract() item ["link"] = titles.select("a/@href").extract() item ['price'] = titles.select('//span[@class="itempp"]/text()').extract() items.append(item) return items

+4

python scrapy craigslist scraper

Joe barreca Mar 17 '13 at 1:14

source share

1 answer

Eric Pauley · Accepted Answer · 2013-03-17T01:34:04+0000

itempp is inside another itempnr element. Perhaps this will work if you change //span[@class="itempp"]/text() to span[@class="itempnr"]/span[@class="itempp"]/text() .

Scrapy Python Craigslist Scraper

More articles: