Saving cookies between scrap scrapes

I collect data from the site daily. Every day I run scrapy, and the first request is always redirected to the home page of the site, because it seems that scrapy does not have cookies yet. However, after the first request, scrapy receives a cookie, and since then it has worked just fine.

This, however, makes it very difficult to use tools such as "scrapy view", etc. with any specific URL, because the site will always be redirected to the home page, and this is what will open in my browser.

Can scrapy save the cookie and I point to use it on all scratches? Can I specify it for viewing using scrapy, etc.

+4
source share
1 answer

There is no built-in mechanism for storing cookies between scrapy runs, but you can create it yourself (source code to demonstrate the idea, not tested):

Step 1: Write Cookies to a File.

Get the cookie from the Set-Cookie response header in your parsing function. Then just convert it to a file.

There are several ways to explain this here: Accessing the session file in scrapy spiders

I prefer a direct approach:

# in your parse method ...
# get cookies
cookies = ";".join(response.headers.getlist('Set-Cookie'))
cookies = cookies.split(";")
cookies = { cookie.split("=")[0]: cookie.split("=")[1] for cookie in cookies }
# serialize cookies
# ... 

Ideally, this should be done with the last answer your scraper gets. Serialize the cookies that come with each response into the same file, overwriting the cookies that you serialized while processing the previous answers.

Step 2: Reading and Using Cookies from a File

, , "cookie":

def start_requests(self):
    old_cookies #= deserialize_cookies(xyz)
    return Request(url, cookies=old_cookies, ...)
0

All Articles