HTTP POST and JSON parsing with Scrapy

Question

HTTP POST and JSON parsing with Scrapy

I have a website from which I want to extract data. Data retrieval is very simple.

It takes parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to make, and then repeat at certain intervals to update the database. Is the procedure suitable for this, or should I use something else?

I really don't need to follow the links, but I need to send multiple requests at the same time.

+4

json python web-crawler scrapy

Crypto Jan 9 '14 at 11:14

source share

3 answers

For processing requests and receiving a response, more than enough. To parse JSON, just use the jsonmodule in the standard library:

import json

data = ...
json_data = json.loads(data)

Hope this helps!

0

aIKid Jan 9 '14 at 11:20

source share

, / - . Scrapy.

If you just want to use HTTP requests, you might consider using python requests .

0

Harry Jan 9 '14 at 13:22

source share

Rollingo · Accepted Answer · 2014-01-11T06:39:43+0000

What does a POST request look like? There are many options, such as simple query parameters ( ?a=1&b=2), a formal payload (the body contains a=1&b=2), or any other kind of payload (the body contains a string in some format, for example json or xml).

In scrapy, it's pretty easy to make POST requests, see http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples

For example, you might need something like this:

    # Warning: take care of the undefined variables and modules!

    def start_requests(self):
        payload = {"a": 1, "b": 2}
        yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))

    def parse_data(self, response):
        # do stuff with data...
        data = json.loads(response.body)

HTTP POST and JSON parsing with Scrapy

More articles: