Get text using the BeautifulSoup CSS selector

Question

HTML example

<h2 id="name"> ABC <span class="numbers">123</span> <span class="lower">abc</span> </h2>

I can get numbers with something like:

 soup.select('#name > span.numbers')[0].text

How to get ABC text using BeautifulSoup and select function?

What in this case?

 <div id="name"> <div id="numbers">123</div> ABC </div>

+6

slaw Jun 17 '16 at 4:18

1 answer

alecxe · Accepted Answer · 2016-06-17T04:22:39+0000

In the first case, get the previous sibling :

 soup.select_one('#name > span.numbers').previous_sibling

In the second case, get the following sibling :

 soup.select_one('#name > #numbers').next_sibling

Note that I assume that you have numbers as the id value, and the div tag instead of span . Therefore, I adjusted the CSS selector.

To cover both cases, you can go to the parent tag and find the non-empty node text in non-recursive mode:

 parent = soup.select_one('#name > .numbers,#numbers').parent print(parent.find(text=lambda text: text and text.strip(), recursive=False).strip())

Pay attention to the change in the selector - we ask you to map the class numbers id or numbers .

Although, I feel that this universal solution will not be reliable enough, because for a start I do not know what your real source data is.