Python requests. Is it possible to get a partial response after an HTTP POST?

I am using the Python Requests Module to publish a site. As part of the data collection process, I have an HTTP POST form and check if she was able to verify the received URL. My question is, after POST, is it possible to request the server not to send the whole page? I only need to check the url, but my program loads the whole page and consumes unnecessary bandwidth. The code is very simple.

import requests r = requests.post(URL, payload) if 'keyword' in r.url: success fail 
+6
source share
3 answers

A simple solution, if feasible for you. Hang out low. Use the socket library. For example, you need to send a POST with some data in its body. I used this in my Crawler for a single site.

 import socket from urllib import quote # POST body is escaped. use quote req_header = "POST /{0} HTTP/1.1\r\nHost: www.yourtarget.com\r\nUser-Agent: For the lulz..\r\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\r\nContent-Length: {1}" req_body = quote("data1=yourtestdata&data2=foo&data3=bar=") req_url = "test.php" header = req_header.format(req_url,str(len(req_body))) #plug in req_url as {0} #and length of req_body as Content-length s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #create a socket s.connect(("www.yourtarget.com",80)) #connect it s.send(header+"\r\n\r\n"+body+"\r\n\r\n") # send header+ two times CR_LF + body + 2 times CR_LF to complete the request page = "" while True: buf = s.recv(1024) #receive first 1024 bytes(in UTF-8 chars), this should be enought to receive the header in one try if not buf: break if "\r\n\r\n" in page: # if we received the whole header(ending with 2x CRLF) break break page+=buf s.close() # close the socket here. which should close the TCP connection even if data is still flowing in # this should leave you with a header where you should find a 302 redirected and then your target URL in "Location:" header statement. 
+2
source

This will help if you have provided some data, for example the example URL you want to request. At the same time, it seems to me that, as a rule, you check if you had the correct URL after your POST request using the following algorithm based on redirect errors or HTTP 404:

 if original_url == returned request url: correct url to a correctly made request else: wrong url and a wrongly made request 

If so, then you can use the HTTP HEAD request (another type of HTTP request, for example GET, POST, etc.) in the Python requests library to get only the header, not the body page. Then you check the response code and the redirect URL (if any) to see if you have been requested to a valid URL.

For instance:

 def attempt_url(url): '''Checks the url to see if it is valid, or returns a redirect or error. Returns True if valid, False otherwise.''' r = requests.head(url) if r.status_code == 200: return True elif r.status_code in (301, 302): if r.headers['location'] == url: return True else: return False elif r.status_code == 404: return False else: raise Exception, "A status code we haven't prepared for has arisen!" 

If this is not exactly what you are looking for, additional information about your requirements will help. At the very least, it gives you a status code and headers without pulling out all the page data.

0
source

There is a chance that the site uses Post / Redirect / Get (PRG) . If so, then enough to not follow the forwarding and read the Location header from the response.

Example

 >>> import requests >>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False) >>> response.status_code 302 >>> response.headers['location'] 'http://httpbin.org/get' 

If you need more information about what you would receive if you were redirected, you can use HEAD in the URL specified in the Location header.

Example

 >>> import requests >>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False) >>> response.status_code 302 >>> response.headers['location'] 'http://httpbin.org/get' >>> response2 = requests.head(response.headers['location']) >>> response2.status_code 200 >>> response2.headers {'date': 'Wed, 07 Nov 2012 20:04:16 GMT', 'content-length': '352', 'content-type': 'application/json', 'connection': 'keep-alive', 'server': 'gunicorn/0.13.4'} 
0
source

All Articles