Cleanse with a beautiful entity preserving soup

I would like to clear the table from the network and save & nbsp; so that I can republish the HTML as later. BeautifulSoup seems to convert them to spaces. Example:

from bs4 import BeautifulSoup html = "<html><body><table><tr>" html += "<td>&nbsp;hello&nbsp;</td>" html += "</tr></table></body></html>" soup = BeautifulSoup(html) table = soup.find_all('table')[0] row = table.find_all('tr')[0] cell = row.find_all('td')[0] print cell 

observed result:

 <td> hello </td> 

desired result:

 <td>&nbsp;hello&nbsp;</td> 
+7
source share
1 answer

In bs4 convertEntities parameter to the BeautifulSoup constructor is no longer supported. HTML objects are always converted to the corresponding Unicode characters (see docs ).

According to the docs, you need to use an output formatter, for example:

 print soup.find_all('td')[0].prettify(formatter="html") 
+5
source

All Articles