How to render tag contents in unicode in BeautifulSoup?

This is the soup from the WordPress post page:

content = soup.body.find('div', id=re.compile('post')) title = content.h2.extract() item['title'] = unicode(title.string) item['content'] = u''.join(map(unicode, content.contents)) 

I want to omit the attached div tag when assigning item['content'] . Is there a way to render all child tag tags in Unicode? Sort of:

 item['content'] = content.contents.__unicode__() 

which will give me one unicode line instead of a list.

+2
source share
1 answer

You tried:

 unicode(content) 

It converts content markup to a single Unicode string.

Edit: If you do not want the tag containing the tags, try:

 content.renderContents() 
+6
source

Source: https://habr.com/ru/post/1314922/


All Articles