A good solution for this is with Load Loaders . The Loaders element is objects that receive data from responses, process data, and build elements for you. Here is an example of an element loader that will split lines and return the first value that matches XPath, if any:
from scrapy.contrib.loader import XPathItemLoader from scrapy.contrib.loader.processor import MapCompose, TakeFirst class MyItemLoader(XPathItemLoader): default_item_class = MyItem default_input_processor = MapCompose(lambda string: string.strip()) default_output_processor = TakeFirst()
And you use it as follows:
def parse(self, response): loader = MyItemLoader(response=response) loader.add_xpath('desc', 'a/text()') return loader.load_item()
Capi etheel
source share