Finding the next tag and its attached text with Beautiful Soup

Question

Finding the next tag and its attached text with Beautiful Soup

I am trying to parse text between a <blockquote> . When I type soup.blockquote.get_text() .

I get the result that I want for the first counter blockquote in the HTML file. How to find the next and consecutive <blockquote> in a file? Maybe I'm just tired and can't find it in the documentation.

Example HTML file:

 <html> <head>header </head> <blockquote>I can get this text </blockquote> <p>eiaoiefj</p> <blockquote>trying to capture this next </blockquote> <p></p><strong>do not capture this</strong> <blockquote> capture this too but separately after "capture this next" </blockquote> </html>

simple python code:

 from bs4 import BeautifulSoup html_doc = open("example.html") soup = BeautifulSoup(html_doc) print.(soup.blockquote.get_text()) # how to get the next blockquote???

+8

python html python-2.7 beautifulsoup

PSeUdocode Feb 17 '14 at 7:34

source share

1 answer

falsetru · Accepted Answer · 2014-02-17T07:39:07+0000

Use find_next_sibling (If it is not a sibling, use find_next )

 >>> html = ''' ... <html> ... <head>header ... </head> ... <blockquote>blah blah ... </blockquote> ... <p>eiaoiefj</p> ... <blockquote>capture this next ... </blockquote> ... <p></p><strong>don'tcapturethis</strong> ... <blockquote> ... capture this too but separately after "capture this next" ... </blockquote> ... </html> ... ''' >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(html) >>> quote1 = soup.blockquote >>> quote1.text u'blah blah\n' >>> quote2 = quote1.find_next_siblings('blockquote') >>> quote2.text u'capture this next\n'

Finding the next tag and its attached text with Beautiful Soup

More articles: