I have an HTML file:
<html> <p>somestr <sup>1</sup> anotherstr </p> </html>
I would like to extract the text as:
somestr 1 anotherstr
but I canβt figure out how to do this. I wrote a to_sup() function that converts numeric strings to superscript, so the closest I get is something like:
for i in doc.xpath('.//p/text()|.//sup/text()'): if i.tag == 'sup': print to_sup(i), else: print i,
but I ElementStringResult doesn't seem to have a method to get the tag name, so I'm a bit lost. Any ideas how to solve it?
source share