How to convert BeautifulSoup.ResultSet to string

Question

How to convert BeautifulSoup.ResultSet to string

So, I analyzed the html page with a .findAll(BeautifulSoup) variable named result. If I type resultin the Python shell and press Enter, I will see plain text as expected, but since I would like to process this result as a string object, I noticed that it str(result)returns garbage, like this sample:

\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />\n<hr />\n</div>

HTML page source utf-8encoded

How can I handle this?

The code is basically this, in case it matters:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)

Python 2.7

+5

python unicode beautifulsoup

theta Oct 16 '11 at 6:33

source share

4 answers

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
#findAll should get multiple parsed result
result = soup.findAll(something)
#then iterate result
for line in result:
    #get str value from each line,replace charset with utf-8 or other charset you need
    print line.__str__('charset')

BTW: BeautifulSoup beautifulsoup-3.2.1

+3

ChangePicture 22 . '13 15:30

, UTF-8. Unicode.

0

Ignacio Vazquez-Abrams 16 . '11 6:43

:

unicodedata.normalize('NFKC', p.decode()).encode('ascii','ignore')

Unicode .
originalEncoding .
unicode python, ( )

0

Lelouch Lamperouge 16 . '11 6:43

Johnny Brown · Accepted Answer · 2012-03-26T01:15:41+0000

Python 2.6.7 BeautifulSoup. version 3.2.0

This worked for me:

unicode.join(u'\n',map(unicode,result))

, result - BeautifulSoup.ResultSet, , -, python

How to convert BeautifulSoup.ResultSet to string

More articles: