<> changed to & lt; and & gt; when parsing html using beautifulsoup in python

When processing html using Beautifulsoup, <and> were converted to &lt; and &gt; Since the tag binding has been converted, the whole soup has lost its structure, any suggestion?

+6
source share
2 answers

Setting formatter=None may help ( http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters ), but it may be a sign that your HTML is not valid.

If this does not work, can you provide sample code and HTML that reproduces the problem?

+4
source

This may be due to an invalid character (due to encoding / decoding of the encoding), so BeautifulSoup has problems analyzing the input. I solve this by passing my string directly to BeautifulSoup without any encoding / decoding. In my case, I tried to convert UTF-16 to UTF-8 myself.

0
source

All Articles