Beautifulsoup, find th with the text "price", then get the price from the next

Question

Beautifulsoup, find th with the text "price", then get the price from the next

My html looks like this:

<td>
   <table ..>
      <tr>
         <th ..>price</th>
         <th>$99.99</th>
      </tr>
   </table>
</td>

So, I'm in the current cell of the table, how would I get the value 99.99?

I still:

td[3].findChild('th')

But I need to do:

Find th with the text "price", then get the next tag string value.

+5

python beautifulsoup

Blankman Jul 31 '10 at 4:30

source share

2 answers

pyparsing HTML , :

from pyparsing import makeHTMLTags, Combine, Word, nums

th,thEnd = makeHTMLTags("TH")
floatnum = Combine(Word(nums) + "." + Word(nums))
priceEntry = (th + "price" + thEnd + 
              th + "$" + floatnum("price") + thEnd)

tokens,startloc,endloc = priceEntry.scanString(html).next()

print tokens.price

Pyparsing makeHTMLTags helper pyparsing, . , "< > " , , . , , , "TH" , "th", "th", "tH" "TH". Pyparsing , "$", "$" .., " ". , "" ( floatum priceEntry), , priceEntry.

( : "99.99" ["99", ".", "99"].)

0

PaulMcG 31 . '10 4:54

Alex Martelli · Accepted Answer · 2010-07-31T05:08:07+0000

Think about it in the “steps” ... given that some xare the root of the subtree in question,

x.findAll(text='price')

- A list of all the elements in this subtree containing the text 'price'. Parents of these items, of course, will:

[t.parent for t in x.findAll(text='price')]

and if you want to keep those whose "name" (tag) 'th', then of course

[t.parent for t in x.findAll(text='price') if t.parent.name=='th']

" " ( 'th' s),

[t.parent.nextSibling for t in x.findAll(text='price')
 if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']

: , . ...:

: th "next sibling", , td , OP.

for t in x.findAll(text='price'):
  p = t.parent
  if p.name != 'th': continue
  ns = p.nextSibling
  if ns and not ns.name: ns = ns.nextSibling
  if not ns or ns.name not in ('td', 'th'): continue
  print ns.string

ns.string, sibling , ( ) - , , , -!). , , print, - , .

, , if...: continue: if - "flat , " Zen of Python (import this , ; -).

Beautifulsoup, find th with the text "price", then get the price from the next

More articles: