How can I get text between tags using the SAX Python parser?

I just need to get the text of the corresponding tag and save it in the database. Since the xml file is large (4.5 GB), I am using sax. I used the character method to get the text and put it in the dictionary. However, when I print text using the endElement method, I get a new line instead of text.

Here is my code:

def characters(self,content): text = unescape(content)) self.map[self.tag]=text def startElement(self, name, attrs): self.tag = name def endElement (self, name) if (name=="sometag") print self.map[name] 

Thanks in advance.

+7
python xml sax
source share
1 answer

The text in the tag is fragmented by the SAX processor. characters can be called multiple times.

You need to do something like:

 def startElement(self, name, attrs): self.map[name] = '' self.tag = name def characters(self, content): self.map[self.tag] += content def endElement(self, name): print self.map[name] 
+6
source share

All Articles