Force Python Scrapy Do Not Encode URL

There are several URLs with [] in it, for example

 http://www.website.com/CN.html?value_ids[]=33&value_ids[]=5007 

But when I try to clear this URL using Scrapy, it requests this URL

 http://www.website.com/CN.html?value_ids%5B%5D=33&value_ids%5B%5D=5007 

How can I make scrapy not specify urlenccode urls?

+6
python scrapy scrapy-spider
source share
1 answer

When creating Request object scrapy, some URL encoding methods are used. To return them, you can use your own middleware and change the URL for your needs.

You can use Downloader Middleware as follows:

 class MyCustomDownloaderMiddleware(object): def process_request(self, request, spider): request._url = request.url.replace("%5B", "[", 2) request._url = request.url.replace("%5D", "]", 2) 

Remember to β€œactivate” the middleware in settings.py as follows:

 DOWNLOADER_MIDDLEWARES = { 'so.middlewares.MyCustomDownloaderMiddleware': 900, } 

My project is named so , and in the folder there is a middlewares.py file. You must configure them in your environment.

+2
source share

All Articles