Retrieving attribute value with beautifulsoup

Question

Retrieving attribute value with beautifulsoup

I am trying to extract the contents of a single value attribute in a particular input tag on a web page. I am using the following code:

import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTag = soup.findAll(attrs={"name" : "stainfo"}) output = inputTag['value'] print str(output)

I get TypeError: list indices should be integers, not str

although from the Beautifulsoup documentation I understand that strings should not be a problem here ... but I am not a specialist and I may have misunderstood.

Any suggestion is much appreciated! Thanks in advance.

+50

python parsing attributes beautifulsoup

Barnabe Apr 10 '10 at 6:53

source share

5 answers

If you want to get multiple attribute values from the source above, you can use findAll and list comprehension to get all you need:

 import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTags = soup.findAll(attrs={"name" : "stainfo"}) ### You may be able to do findAll("input", attrs={"name" : "stainfo"}) output = [x["stainfo"] for x in inputTags] print output ### This will print a list of the values.

+3

Margath Aug 28 '12 at 15:35

source share

I would advise you to save time by assuming that you know which tags have these attributes.

Suppose the xyz tag has an attritube named "staininfo" ..

 full_tag = soup.findAll("xyz")

And I don’t understand that full_tag is a list

 for each_tag in full_tag: staininfo_attrb_value = each_tag["staininfo"] print staininfo_attrb_value

This way you can get all attrb staininfo values for all xyz tags

+1

b1tchacked Jul 08 '12 at 12:20

source share

In Python 3.x just use get(attr_name) for the tag object that you use with find_all :

 xmlData = None with open('conf//test1.xml', 'r') as xmlFile: xmlData = xmlFile.read() xmlDecoded = xmlData xmlSoup = BeautifulSoup(xmlData, 'html.parser') repElemList = xmlSoup.find_all('repeatingelement') for repElem in repElemList: print("Processing repElem...") repElemID = repElem.get('id') repElemName = repElem.get('name') print("Attribute id = %s" % repElemID) print("Attribute name = %s" % repElemName)

for the XML file conf//test1.xml , which looks like this:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root> <singleElement> <subElementX>XYZ</subElementX> </singleElement> <repeatingElement id="11" name="Joe"/> <repeatingElement id="12" name="Mary"/> </root>

prints:

 Processing repElem... Attribute id = 11 Attribute name = Joe Processing repElem... Attribute id = 12 Attribute name = Mary

+1

amphibient Nov 16 '16 at 19:36

source share

You can also use this:

 import requests from bs4 import BeautifulSoup import csv url = "http://58.68.130.147/" r = requests.get(url) data = r.text soup = BeautifulSoup(data, "html.parser") get_details = soup.find_all("input", attrs={"name":"stainfo"}) for val in get_details: get_val = val["value"] print(get_val)

0

Mr.Bones Oct 18 '17 at 18:40

source share

Łukasz · Accepted Answer · 2010-04-10 07:06

.findAll() returns a list of all elements found, so:

 inputTag = soup.findAll(attrs={"name" : "stainfo"})

inputTag is a list (possibly containing only one item). Depending on what you want, you should either do:

  output = inputTag[0]['value']

or use the .find() method, which returns only one (first) element found:

  inputTag = soup.find(attrs={"name": "stainfo"}) output = inputTag['value']

Retrieving attribute value with beautifulsoup

More articles: