So, I analyzed the html page with a .findAll(BeautifulSoup) variable named result. If I type resultin the Python shell and press Enter, I will see plain text as expected, but since I would like to process this result as a string object, I noticed that it str(result)returns garbage, like this sample:
\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />\n<hr />\n</div>
HTML page source utf-8encoded
How can I handle this?
The code is basically this, in case it matters:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)
Python 2.7
source
share