Find input field value in html doc using python

Question

Find input field value in html doc using python

I am trying to get input values from an HTML document and want to parse the values of hidden input fields. For example, how can I parse only the value from the snippet below using python.

<input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />

And the output of the python function should return something like:

 post_form_id : d619a1eb3becdc05a3ebea530396782f fb_dtsg : AQCYsohu

+4

python

Vlad Sep 19 '11 at 16:24

source share

2 answers

Or with lxml :

 import lxml.html htmlstr = ''' <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" /> ''' // Parse the string and turn it into a tree of elements htmltree = lxml.html.fromstring(htmlstr) // Iterate over each input element in the tree and print the relevant attributes for input_el in htmltree.xpath('//input'): name = input_el.attrib['name'] value = input_el.attrib['value'] print "%s : %s" % (name, value)

It gives:

  post_form_id: d619a1eb3becdc05a3ebea530396782f
 fb_dtsg: AQCYsohu

+3

Acorn Sep 19 '11 at 17:16

source share

jterrace · Accepted Answer · 2011-09-19T16:34:48+0000

You can use BeautifulSoup :

 >>> htmlstr = """ <input type="hidden" autocomplete="off" id="post_form_id" name="post_form_id" value="d619a1eb3becdc05a3ebea530396782f" /> ... <input type="hidden" name="fb_dtsg" value="AQCYsohu" autocomplete="off" />""" >>> from BeautifulSoup import BeautifulSoup >>> soup = BeautifulSoup(htmlstr) >>> [(n['name'], n['value']) for n in soup.findAll('input')] [(u'post_form_id', u'd619a1eb3becdc05a3ebea530396782f'), (u'fb_dtsg', u'AQCYsohu')]

Find input field value in html doc using python

More articles: