price

Beautifulsoup, find th with the text "price", then get the price from the next

My html looks like this:

<td>
   <table ..>
      <tr>
         <th ..>price</th>
         <th>$99.99</th>
      </tr>
   </table>
</td>

So, I'm in the current cell of the table, how would I get the value 99.99?

I still:

td[3].findChild('th')

But I need to do:

Find th with the text "price", then get the next tag string value.

+5
source share
2 answers

Think about it in the β€œsteps” ... given that some xare the root of the subtree in question,

x.findAll(text='price')

- A list of all the elements in this subtree containing the text 'price'. Parents of these items, of course, will:

[t.parent for t in x.findAll(text='price')]

and if you want to keep those whose "name" (tag) 'th', then of course

[t.parent for t in x.findAll(text='price') if t.parent.name=='th']

" " ( 'th' s),

[t.parent.nextSibling for t in x.findAll(text='price')
 if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']

: , . ...:

: ​​ th "next sibling", , td , OP.

for t in x.findAll(text='price'):
  p = t.parent
  if p.name != 'th': continue
  ns = p.nextSibling
  if ns and not ns.name: ns = ns.nextSibling
  if not ns or ns.name not in ('td', 'th'): continue
  print ns.string

ns.string, sibling , ( ) - , , , -!). , , print, - , .

, , if...: continue: if - "flat , " Zen of Python (import this , ; -).

+8

pyparsing HTML , :

from pyparsing import makeHTMLTags, Combine, Word, nums

th,thEnd = makeHTMLTags("TH")
floatnum = Combine(Word(nums) + "." + Word(nums))
priceEntry = (th + "price" + thEnd + 
              th + "$" + floatnum("price") + thEnd)

tokens,startloc,endloc = priceEntry.scanString(html).next()

print tokens.price

Pyparsing makeHTMLTags helper pyparsing, . , "< > " , , . , , , "TH" , "th", "th", "tH" "TH". Pyparsing , "$", "$" .., " ". , "" ( floatum priceEntry), , priceEntry.

( : "99.99" ["99", ".", "99"].)

0

All Articles