I don't want this

I want this

Use BeautifulSoup to retrieve sibling nodes between two nodes

I have a document like this:

<p class="top">I don't want this</p> <p>I want this</p> <table> <!-- ... --> </table> <img ... /> <p> and all that stuff too</p> <p class="end>But not this and nothing after it</p> 

I want to extract everything between p [class = top] and p [class = end].

Is there a good way to do this with BeautifulSoup?

+6
python beautifulsoup
source share
1 answer

node.nextSibling Attribute is your solution:

 from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) nextNode = soup.find('p', {'class': 'top'}) while True: # process nextNode = nextNode.nextSibling if getattr(nextNode, 'name', None) == 'p' and nextNode.get('class', None) == 'end': break 

This tricky condition is that you are accessing the attributes of the HTML tag, not the string nodes.

+8
source share

All Articles