How to import django models into scrapy pipelines.py file

I am trying to import models of a single django application into my pipelines.py to save data using django orm. I created the scrapy project scrapy_project in the first participating django application "app1" (by the way, is this a good choice?). I added these lines to my scrapy settings file:

def setup_django_env(path): import imp, os from django.core.management import setup_environ f, filename, desc = imp.find_module('settings', [path]) project = imp.load_module('settings', f, filename, desc) setup_environ(project) current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__))) setup_django_env(os.path.join(current_dir, '../../d_project1')) 

When I try to import django app1 app models, I get the following error message:

 Traceback (most recent call last): File "/usr/local/bin/scrapy", line 4, in <module> execute() File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in _run_print_help func(*a, **kw) File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in _run_command cmd.run(args, opts) File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in run spider = self.crawler.spiders.create(spname, **opts.spargs) File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler self._crawler.configure() File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure self.engine = ExecutionEngine(self, self._spider_closed) File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in __init__ self.scraper = Scraper(crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__ self.itemproc = itemproc_cls.from_crawler(crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings mwcls = load_object(clspath) File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in load_object raise ImportError, "Error loading object '%s': %s" % (path, e) ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module named dydict.models 

Why can't I access the django application models for scanning (given that app1 is in install_app)?

+4
source share
2 answers

Try:

 from .. models import MyModel 

OR

 from ... models import MyModel 

Each point represents a location.

0
source

In pipelines, you do not import django models; you use scrapy models limited to the django model. You should add Django settings in scrapy settings, not after.

To use django models in a scrapy project you must use django_Item https://github.com/scrapy-plugins/scrapy-djangoitem (import into your pythonpath)

My recommended file structure:

 Projects |-DjangoScrapy |-DjangoProject | |-Djangoproject | |-DjangoAPP |-ScrapyProject |-ScrapyProject |-Spiders 

Then in your scrapy project you need to add the pythonpath ull path to the django project :

 **# Setting up django project full path.** import sys sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject') # Setting up django settings module name. import os os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings' 

Then in your items.py you linked your Django models to scrapy models:

 from DjangoProject.models import Person, Job from scrapy_djangoitem import DjangoItem class Person(DjangoItem): django_model = Person class Job(DjangoItem): django_model = Job 

Then u can use the .save () method in the pipelines after the yeld object:

spider.py

 from scrapy.spider import BaseSpider from mybot.items import PersonItem class ExampleSpider(BaseSpider): name = "example" allowed_domains = ["dmoz.org"] start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/'] def parse(self, response): # do stuff return PersonItem(name='zartch') 

pipelines.py

 from myapp.models import Person class MybotPipeline(object): def process_item(self, item, spider): obj = Person.objects.get_or_create(name=item['name']) return obj 

I have a repository with minimal code working: (you just need to set the path to your django project in scrapy settings) https://github.com/Zartch/Scrapy-Django-Minimal

at: https://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py You must change my Django project path to your DjangoProject path:

 sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb') 
0
source

All Articles