Python soup parsing analysis

My goal is to capture a list of all input names and values. Connect them and submit the form. Names and meanings are randomized.

from bs4 import BeautifulSoup # parsing html = """ <html> <head id="Head1"><title>Title Page</title></head> <body> <form id="formS" action="login.asp?dx=" method="post"> <input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' > <input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' > <input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' > </form> </body> </html> """ html_proc = BeautifulSoup(html) 

This bit works fine:

 print html_proc.find("input", value=True)["value"] > gDcZHY+nV 

However, the following statements do not work or do not work in the hope of:

 print html_proc.find("input", name=True)["name"] > TypeError: find() got multiple values for keyword argument 'name' print html_proc.findAll("input", value=True, attrs={'value'}) > [] print html_proc.findAll('input', value=True) > <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV"> > <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n"> > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV"> > </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" > value="kgDcZHY+n"> > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV"> > </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p > UhnV"></input> 
+6
source share
2 answers

You cannot submit a form using BeautifulSoup , but here you can get a list of names, pairs of values:

 print [(element['name'], element['value']) for element in html_proc.find_all('input')] 

prints:

 [('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'), ('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'), ('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')] 
+18
source
 d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})} print(d) 

prints:

 {'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n', 'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV', 'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'} 

Based on @alecxe, this avoids KeyErrors and parses the form in a dictionary more ready for inquiries .

 url = 'http://example.com/' + html_proc.form['action'] requests.post(url , data=d) 

Although, if it gets more complicated (cookies, scripts), you can Mechanize .


The reason for TypeError is the confusion over the first parameter so that find () is "name". Instead of html_proc.find("input", attrs={'name': True}) . Also, for the attrs parameter, instead of typing {'value'}, use the dictionary {'value': True} .

+6
source

All Articles