Running Scrapy from Django View

My experience with Scrapy is limited, and every time I use it, it is always executed through terminal commands. How can I get my form data (url to be cleared) from my django template to contact scrapy to start doing scraping? So far, I have only been thinking about getting the form returned by the data from django views, and then try to contact spider.py in the scrapy directory to add the form URL to spider start_urls. From there, I really don’t know how to call the actual bypass, as I am used to doing it strictly through my terminal with commands like "scrapy crawl dmoz". Thanks.

tiny editing: just discovered scrapyd ... I think I can go in the right direction with that.

+8
python django web-scraping scrapy
source share
1 answer

You actually answered it with editing. The best option would be to install the scrapyd service and make a schedule.json API call to start scrambling.

To make this HTTP API call, you can use urllib2 / requests or use the wrapper around the scrapyd API - python-scrapyd-api :

 from scrapyd_api import ScrapydAPI scrapyd = ScrapydAPI('http://localhost:6800') scrapyd.schedule('project_name', 'spider_name') 

If we scrapyd and try to start the spider from the view , it will block the request until the twisted reactor stops - so this is actually not an option.

You can, however, start using celery (in tandem with django_celery ) - define a task that will launch your Scrapy spider and invoke the task from your django view. Thus, you put the task in the queue and did not want the user to wait for the scan to complete.


Also consider the django-dynamic-scraper package:

Django Dynamic Scraper (DDS) is an application for creating Django on top of a scraper frame. While retaining many features, Scrapy allows you to dynamically create and manage spiders through the Django Admin Interface.

+9
source share

All Articles