Training Exceptions

I follow the Scrapy documentation documentation at http://media.readthedocs.org/pdf/scrapy/0.14/scrapy.pdf and I checked that items.py and dmoz_spider.py are typed (not cut and pasted).

The first part of "hmmm ..." for me was this instruction:

This is the code for our first Spider; save it in a file named dmoz_spider.py in the dmoz / spiders directory

I am using the latest version of Ubuntu and there was no dmoz folder created, so I put this code in ~ / tutorial / tutorial / spiders. (Was that my first mistake?)

So here is my dmoz_spider.py script:

from scrapy.spider import BaseSpider class DmozSpider(BaseSpider): name = "dmoz" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse(self, response): filename = response.url.split("/")[-2] open(filename, 'wb').write(response.body) 

In my terminal I type

 scrapy crawl dmoz 

And I get this:

 2012-10-08 13:20:22-0700 [scrapy] INFO: Scrapy 0.12.0.2546 started (bot: tutorial) 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Enabled item pipelines: 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2012-10-08 13:20:22-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2012-10-08 13:20:22-0700 [dmoz] INFO: Spider opened 2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None) 2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: <None>) Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop self.runUntilCurrent() File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent call.func(*call.args, **call.kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse raise NotImplementedError exceptions.NotImplementedError: 2012-10-08 13:20:22-0700 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None) 2012-10-08 13:20:22-0700 [dmoz] ERROR: Spider error processing <http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: <None>) Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop self.runUntilCurrent() File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent call.func(*call.args, **call.kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/python2.7/dist-packages/scrapy/spider.py", line 62, in parse raise NotImplementedError exceptions.NotImplementedError: 2012-10-08 13:20:22-0700 [dmoz] INFO: Closing spider (finished) 2012-10-08 13:20:22-0700 [dmoz] INFO: Spider closed (finished) 

In my search, I saw that someone else said that twisted was probably not installed ... but would it not be installed if I used the Ubuntu package installer for Scrapy?

Thanks in advance!

+8
scrapy
source share
1 answer

The parse method in BaseSpider receives a call instead of yours because you incorrectly redefined the analysis method. Your indentation is incorrect, so parsing is declared as a function outside the DmozSpider class. Welcome to python :)

This has nothing to do with twisted, I see that it is twisted in the trace, so it is clearly installed.

+15
source share

All Articles