Lxml - ignore tag in html

I wrote a tiny html parser in Python using lxml. This is very useful, but I have a problem.

I have the following code:

tags = doc.xpath('//table//tr/td[@align="right"]/b') for tag in tags: print(x.text.strip()) 

It works great. But if there is a <br> tag in the <b> element, for example:

 <b> first-half <br> second-half </b> 

this code will only print first-half in the <b> .

How can I get all the text in <b> even if there is a <br> tag?

Thanks.

+4
source share
1 answer

Use text_content() to extract all text without markup in the tag. Replace x.text with x.text_content() .

+3
source

All Articles