Why does this element in lxml include tail?

Consider this Python script:

from lxml import etree

html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
  <body>
    <p>This is some text followed with 2 citations.<span class="footnote">1</span>
       <span lass="footnote">2</span>This is some more text.</p>
  </body>
</html>'''

tree = etree.fromstring(html)

for element in tree.findall(".//{*}span"):
    if element.get("class") == 'footnote':
        print(etree.tostring(element, encoding="unicode", pretty_print=True))

The desired result will consist of 2 spanelements, instead I get:

<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">1</span>
<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">2</span>This is some more text.

Why does it include text after the element to the end of the parent element?

I am trying to use lxml to refer to footnotes, and when I insert a.insert()an element spaninto the element athat I create for it, it includes the text after and so, linking large volumes of text, you want to link.

+4
source share
2 answers

The task with_tail=Falsewill delete the tail text.

print(etree.tostring(element, encoding="unicode", pretty_print=True, with_tail=False))

See the lxml.etree.tostringdocumentation .

+3
source

, .

, , . , XML with_tail=False etree.tostring().

'', .

0

All Articles