I was wondering if anyone ever tried to fetch / follow links in an RSS feed using SgmlLinkExtractor / CrawlSpider. I can not make it work ...
I use the following rule:
rules = (
Rule (SgmlLinkExtractor (tags = ('link',), attrs = False),
follow = True,
callback = 'parse_article'),
)
(bearing in mind that rss links are in the link tag).
I am not sure how to tell SgmlLinkExtractor to retrieve the text () link and not look for attributes ...
Any help is appreciated, thanks in advance
python web-crawler scrapy
kal3v
source share