Python lxml packaging elements

I was wondering what is the easiest way to wrap an element with another element using lxml and Python, for example, if I have an html fragment:

<h1>The cool title</h1>
<p>Something Neat</p>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
<p>The end of the snippet</p>

And I want to wrap a table element with a section element as follows:

<h1>The cool title</h1>
<p>Something Neat</p>
<section>
<table>
<tr>
<td>aaa</td>
<td>bbb</td>
</tr>
</table>
</section>
<p>The end of the snippet</p>

Another thing I would like to do is to clear the XML document in h1s with a specific attribute and then wrap all the elements until the next h1 tag in the element, for example:

<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>

Converted to:

<section>
<h1 class='neat'>Subject 1</h1>
<p>Here is a bunch of boring text</p>
<h2>Minor Heading</h2>
<p>Here is some more</p>
</section>
<section>
<h1 class='neat>Subject 2</h1>
<p>And Even More</p>
</section>

Thanks for the help, Chris.

+5
source share
2 answers

lxml awesome xml, , -xhtml html. , BeautifulSoup, .

lxml :

import lxml.etree

TEST="<html><h1>...</html>"

def insert_section(root):
    tables = root.findall(".//table")
    for table in tables:
        section = ET.Element("section")
        table.addprevious(section)
        section.insert(0, table)   # this moves the table

root = ET.fromstring(TEST)
insert_section(root)
print ET.tostring(root)

- , , , , . element.index() .

+4

xml, BeautifulSoup http://www.crummy.com/software/BeautifulSoup/

Beautiful Soup - xml python. python html / . is_h1, XML . , .

, HttpResponse , xml-.

0

All Articles