Extraction of value in Beautifulsoup

Question

Extraction of value in Beautifulsoup

I have the following code:

f = open(path, 'r') html = f.read() # no parameters => reads to eof and returns string soup = BeautifulSoup(html) schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'}) print schoolname

which gives:

 [<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">AB Paterson College, Arundel, QLD</span>]

when I try to access a value (i.e. "AB Paterson College, Arundel, QLD) using schoolname['value'] , I get the following error:

 print schoolname['value'] TypeError: list indices must be integers, not str

What am I doing wrong to get this value?

+1

python beautifulsoup

Seth Apr 11 '10 at

source share

2 answers

findAll returns a list of strings, so you get an exception. I am sure your problem is solved simply by using find instead of findAll. Then you can access the desired value:

 schoolname['value']

Obviously, this only “works” if you need only one specific value.

+1

kabp Aug 08 '12 at 12:13

source share

Mark Byers · Accepted Answer · 2010-04-11 10:17

You can use contents to navigate the tree:

 >>> for x in schoolname: >>> print x.contents [u'A B Paterson College, Arundel, QLD']

Note that the content does not have to be a string - in general, it can also be more tags or a mixture of strings and tags.

Extraction of value in Beautifulsoup

More articles: