Beautifulsoup, find th with the text "price", then get the price from the next
My html looks like this:
<td>
<table ..>
<tr>
<th ..>price</th>
<th>$99.99</th>
</tr>
</table>
</td>
So, I'm in the current cell of the table, how would I get the value 99.99?
I still:
td[3].findChild('th')
But I need to do:
Find th with the text "price", then get the next tag string value.
Think about it in the βstepsβ ... given that some xare the root of the subtree in question,
x.findAll(text='price')
- A list of all the elements in this subtree containing the text 'price'. Parents of these items, of course, will:
[t.parent for t in x.findAll(text='price')]
and if you want to keep those whose "name" (tag) 'th', then of course
[t.parent for t in x.findAll(text='price') if t.parent.name=='th']
" " ( 'th' s),
[t.parent.nextSibling for t in x.findAll(text='price')
if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']
: , . ...:
: ββ th "next sibling", , td , OP.
for t in x.findAll(text='price'):
p = t.parent
if p.name != 'th': continue
ns = p.nextSibling
if ns and not ns.name: ns = ns.nextSibling
if not ns or ns.name not in ('td', 'th'): continue
print ns.string
ns.string, sibling , ( ) - , , , -!). , , print, - , .
, , if...: continue: if - "flat , " Zen of Python (import this , ; -).
pyparsing HTML , :
from pyparsing import makeHTMLTags, Combine, Word, nums
th,thEnd = makeHTMLTags("TH")
floatnum = Combine(Word(nums) + "." + Word(nums))
priceEntry = (th + "price" + thEnd +
th + "$" + floatnum("price") + thEnd)
tokens,startloc,endloc = priceEntry.scanString(html).next()
print tokens.price
Pyparsing makeHTMLTags helper pyparsing, . , "< > " , , . , , , "TH" , "th", "th", "tH" "TH". Pyparsing , "$", "$" .., " ". , "" ( floatum priceEntry), , priceEntry.
( : "99.99" ["99", ".", "99"].)