Extraction of value in Beautifulsoup

I have the following code:

f = open(path, 'r') html = f.read() # no parameters => reads to eof and returns string soup = BeautifulSoup(html) schoolname = soup.findAll(attrs={'id':'ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel'}) print schoolname 

which gives:

 [<span id="ctl00_ContentPlaceHolder1_SchoolProfileUserControl_SchoolHeaderLabel">AB Paterson College, Arundel, QLD</span>] 

when I try to access a value (i.e. "AB Paterson College, Arundel, QLD) using schoolname['value'] , I get the following error:

 print schoolname['value'] TypeError: list indices must be integers, not str 

What am I doing wrong to get this value?

+1
python beautifulsoup
Apr 11 '10 at
source share
2 answers

You can use contents to navigate the tree:

 >>> for x in schoolname: >>> print x.contents [u'A B Paterson College, Arundel, QLD'] 

Note that the content does not have to be a string - in general, it can also be more tags or a mixture of strings and tags.

+1
Apr 11 '10 at
source share

findAll returns a list of strings, so you get an exception. I am sure your problem is solved simply by using find instead of findAll. Then you can access the desired value:

 schoolname['value'] 

Obviously, this only β€œworks” if you need only one specific value.

+1
Aug 08 '12 at 12:13
source share



All Articles