Disclaimer: I'm pretty new to Scrapy.
To ask my question: how can I get the Item property from a link on a page and return the results to the same element?
Given the following Spider example:
class SiteSpider(Spider): site_loader = SiteLoader ... def parse(self, response): item = Place() sel = Selector(response) bl = self.site_loader(item=item, selector=sel) bl.add_value('domain', self.parent_domain) bl.add_value('origin', response.url) for place_property in item.fields: parse_xpath = self.template.get(place_property)
I run these spiders on several sites, and most of them have the data that I need on one page, and everything works fine. However, some sites have certain properties on the sub-page (for example, "address" data that exists on the "Get directions" link).
The "Request Request" line really is where I have the problem. I see that the elements are moving along the pipeline, but they lack the properties that are on other URLs (IOW, those properties that receive a "Request Request"). The get_url_property basically searches for xpath in the new response variable and adds this to the element loader instance.
Is there a way to do what I'm looking for, or is there a better way? I would like to avoid a synchronous call in order to get the data I need (if this is possible even here), but if this is the best way, then perhaps this is the right approach. Thanks.
source share