Printing html objects using lxml in python

I am trying to make a div element from the line below with html objects. Since my line contains html entities, the &reserved char in the html object is highlighted as &output. Thus, html objects are displayed as plain text. How can I avoid this so that the html objects render correctly?

s = 'Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources'

div = etree.Element("div")
div.text = s

lxml.html.tostring(div)

output:
<div>Actress Adamari L&amp;#243;pez And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts&amp;#8482; Website And Resources</div>
+4
source share
1 answer

You can specify encodingwhen calling tostring():

>>> from lxml.html import fromstring, tostring
>>> s = 'Actress Adamari L&#243;pez And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts&#8482; Website And Resources'
>>> div = fromstring(s)
>>> print tostring(div, encoding='unicode')
<p>Actress Adamari López And Amgen Launch Spanish-Language Chemotherapy: Myths Or Facts™ Website And Resources</p>

As a note, you should definitely uselxml.html.tostring() when working with data HTML:

, lxml.html.tostring, lxml.tostring. lxml.tostring(doc) XML- , HTML. , , <script src="..."></script>, <script src="..." />, .

:

+3

All Articles