Lxml - ignore tag in html

Question

I wrote a tiny html parser in Python using lxml. This is very useful, but I have a problem.

I have the following code:

tags = doc.xpath('//table//tr/td[@align="right"]/b') for tag in tags: print(x.text.strip())

It works great. But if there is a   tag in the  element, for example:

 <b> first-half <br> second-half </b>

this code will only print first-half in the  .

How can I get all the text in  even if there is a   tag?

Thanks.

+4

shau-kote Feb 28 '13 at 9:03

1 answer

Anorov · Accepted Answer · 2013-02-28T21:12:35+0000

Use text_content() to extract all text without markup in the tag. Replace x.text with x.text_content() .