below is my spider code,
class Blurb2Spider(BaseSpider): name = "blurb2" allowed_domains = ["www.domain.com"] def start_requests(self): yield self.make_requests_from_url("http://www.domain.com/bookstore/new") def parse(self, response): hxs = HtmlXPathSelector(response) urls = hxs.select('//div[@class="bookListingBookTitle"]/a/@href').extract() for i in urls: yield Request(urlparse.urljoin('www.domain.com/', i[1:]),callback=self.parse_url) def parse_url(self, response): hxs = HtmlXPathSelector(response) print response,'------->'
Here I try to combine the href link with the base link, but I get the following error,
exceptions.ValueError: Missing scheme in request url: www.domain.com//bookstore/detail/3271993?alt=Something+I+Had+To+Do
Can someone tell me why I get this error and how to join the base url with the href link and give a request
python url scrapy
shiva krishna
source share