Customizing HTML textarea content with python, BeautifulSoup, mechanize (no forms, just divs)

Question

Customizing HTML textarea content with python, BeautifulSoup, mechanize (no forms, just divs)

I am trying to fill out a form containing a textarea element. I use Python with the BeautifulSoap and Mechanize modules (stuck on 2.6.5 on FreeBSD 8.1 with the latest modules in the FreeBSD repository: BeautifulSoup 3.1.0.1 and mechanize 0.2.1).

The problem with BeautifulSoap is that it does not set the contents of textarea correctly (I can try soup.textarea.insert(0, "FOO") or even soup.textarea.contents = "FOO" , but as soon as I check the current value with using soup.textarea , I still see the old HTML tags without in between:

<textarea name="classified_description" class="classified_textarea_text"></textarea>

The problem with mechanization is that it acts only on true forms. In the HTML that I analyze below, this is not a form, but a set of sections with input elements inside.

How can I use Python or any of these modules to set the value of this textarea element?

 <div class="classified_field"> <div class="classified_input_label">Description</div> <div class="classified_textarea_div"> <textarea name="classified_description" id="classified_description" class="classified_textarea_text"></textarea> </div> <div class="site_clear"></div> </div>

I tried Vladimir's method below, and although it works with his example, for some reason it does not work in my production code. I can use .find() to get textarea , but .insert() gives me sadness. Here is what I still have:

 >>> soup.find('textarea', {'name': 'classified_description'}) <textarea name="classified_description" class="classified_textarea_text"></textarea> >>> soup.find('textarea', {'name': 'classified_description'}).insert(0, "some text here") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.6/site-packages/BeautifulSoup.py", line 233, in insert newChild.nextSibling.previousSibling = newChild AttributeError: 'unicode' object has no attribute 'previousSibling' >>>

Does anyone know why this will happen through a Unicode error? Obviously my soup object is not just a unicode string, because I am successfully using .find .

SOLUTION: Vladimir is correct, but for the real world HTML you can generate a malformed start tag error in BeautifulSoup 3.1 (the official reason is here ). After upgrading to BeautifulSoup 3.0.8, everything worked fine. When I posted the original question, I had to do some jury trials to mechanize before read() in the BeautifulSoup object to prevent the malformed start tag from error. This led to the creation of a uencode sting instead of a BeautifulSoup object. Fixing my mechanization code with the old BeautifulSoup caused the desired behavior.

+4

python forms beautifulsoup textarea mechanize

hamx0r Aug 12 '12 at 5:22

source share

1 answer

Vladimir · Accepted Answer · 2012-08-12T06:35:18+0000

Here is an example using BeautifulSoup:

 from BeautifulSoup import BeautifulSoup soup = BeautifulSoup('<textarea name="classified_description"></textarea>') soup.find('textarea', {'name': 'classified_description'}).insert(0, 'value') assert str(soup) == '<textarea name="classified_description">value</textarea>'

BeautifulSoup's parsing tree change documentation details these transformations.

Customizing HTML textarea content with python, BeautifulSoup, mechanize (no forms, just divs)

More articles: