Parse HTML with a beautiful soup. Return text from a specific tag

Question

Parse HTML with a beautiful soup. Return text from a specific tag

I can parse the full argument of the html tag addressing it through a unix script shell as follows:

# !/usr/bin/python3

# import the module
from bs4 import BeautifulSoup

# define your object
soup = BeautifulSoup(open("test.html"))

# get the tag
print(soup(itemprop="name"))

where itemprop="name"uniquely identifies the desired tag.

the conclusion is similar to

[<span itemprop="name">
                    Blabla &amp; Bloblo</span>]

Now I would like to return only a part Bla Bla Blo Blo.

my attempt was to do:

print(soup(itemprop="name").getText())

but I get an error like AttributeError: 'ResultSet' object has no attribute 'getText'

he worked experimentally in other contexts such as

print(soup.find('span').getText())

So am I mistaken?

+4

python html beautifulsoup

joaoal Aug 12 '14 at 15:10

source share

1 answer

Martijn Pieters · Accepted Answer · 2014-08-12T15:13:36+0000

Using the object soupas the called returns a list of results, as if you were using soup.find_all(). See Documentation:

find_all() - API Beautiful Soup, . BeautifulSoup Tag, , , find_all() .

soup.find(), :

soup.find(itemprop="name").get_text()

:

soup(itemprop="name")[0].get_text()

Parse HTML with a beautiful soup. Return text from a specific tag

More articles: