Beautifulsoup parsing - working with superscript?

This is the HTML segment that I am trying to extract from:

<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1"><span id="yfs_j10_aal">33.57B</span></td></tr>

The web page looks like:

Market Capitalization (intraday) 5 : 33.57B

Whats mine (doesn't work):

    HTML_MarketCap = soup.find('sup', text='5').find_next_sibling('span').text

How can I extract the string 33.57B?

+4
source share
2 answers

Span is not a sibling, it is a child grandparents siblingsonce deleted (thanks, 1.618).

from bs4 import BeautifulSoup as bs
soup = bs("""<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)
<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1">
<span id="yfs_j10_aal">33.57B</span></td></tr>""")

soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text
# u'33.57B'

Since you have problems with this, here is my full script test (using python-requests ) that works reliably for me

import requests
from bs4 import BeautifulSoup as bs

url = "https://finance.yahoo.com/q/ks?s=AAL+Key+Statistics"

r = requests.get(url)

soup = bs(r.text)

HTML_MarketCap = soup.find("sup", text="5").parent.parent.find_next_sibling("td").find("span").text

print HTML_MarketCap
+2
source

find_next() <sup>5</sup>, :

from bs4 import BeautifulSoup

s = '''<td class="yfnc_tablehead1" width="74%">Market Cap (intraday)<font size="-1"><sup>5</sup></font>:</td><td class="yfnc_tabledata1"><span id="yfs_j10_aal">33.57B</span></td></tr>'''

soup  =BeautifulSoup(s)

sup = soup.find('sup', text='5')

sup.find_next('span')
Out[5]: <span id="yfs_j10_aal">33.57B</span>

sup.find_next('span').text
Out[6]: u'33.57B'


>>>help(sup.find_next)

find_next bs4.element:

find_next (self, name = None, attrs = {}, text = None, ** kwargs) bs4.element.Tag      , ,      .

+2

All Articles