The most reliable way to do this is probably with LXML .
from lxml import etree ... tree = etree.parse('somefile.xml') notags = etree.tostring(tree, encoding='utf8', method='text') print(notags)
This will avoid problems with the "parsing" of XML with regular expressions and should correctly handle escaping and that's it.
Jeremiah
source share