using nextSibling from BeautifulSoup doesn't display anything

I am trying to use BeautifulSoup for the following:

<h4>Hello<br /></h4> <p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p> 

In this example, let's say I have an <h4> tag stored in a tag variable. When I type print tag.text , the output is Hello , as expected.

However, when I use print tag.nextSibling , the output is nothing. When I type print tag.nextSibling.nextSibling , the output is <p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p> . What's happening? Why do I need to double use .nextSibling to jump to the <p> in my example? This is constantly a mistake.

+4
source share
2 answers

Apparently .nextSibling will capture white text. So the actual page I'm working with has white text between the <h4> and <p> tags, so I have to double it.

Proof of

Record:

 print tag.__class__ print tag.nextSibling.__class__ print tag.nextSibling.nextSibling.__class__ 

Productivity:

 <class 'BeautifulSoup.Tag'> <class 'BeautifulSoup.NavigableString'> <class 'BeautifulSoup.Tag'> 
+3
source

Here is what the official documentation says: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#going-down

In real documents, the .next_sibling or .previous_sibling tag will usually have a string containing spaces. Back to the Three Sisters document:

 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> 

You might think that the .next_sibling of the first tag will be the second tag. But actually its line: a comma and a new line that separate the first tag from the second:

 link = soup.a link # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> link.next_sibling # u',\n' 

The second tag is a .next_sibling comma:

 link.next_sibling.next_sibling # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 
+2
source

All Articles