Sax parsing and coding

I have a contact that is having problems with SAX while parsing RSS and Atom files. According to him, it is as if the text coming from the elements of the Element is truncated during an apostrophe or sometimes accented character. There seems to be a problem with the encoding too.

I tried SAX, and I also have a truncation, but have not yet been able to dig. I would appreciate some suggestions if any of them have already decided this.

This is the code used by ContentHandler:

public void characters( char[], int start, int end ) throws SAXException { // link = new String(ch, start, end); 

Edit: The encoding problem may be related to storing information in a byte array, since I know that Java works in Unicode.

+6
java parsing atom-feed rss sax
source share
3 answers

The characters () method does not guarantee the full content of a character in a text element in one pass - the full text can span the boundaries of the buffer. You must buffer the characters between the start and end events of the element yourself.

eg.

 StringBuilder builder; public void startElement(String uri, String localName, String qName, Attributes atts) { builder = new StringBuilder(); } public void characters(char[] ch, int start, int length) { builder.append(ch,start,length); } public void endElement(String uri, String localName, String qName) { String theFullText = builder.toString(); } 
+13
source share

XML objects generate special events in SAX. You can catch them with LexicalHandler , although this is not necessary at all. But this explains why it cannot be assumed that you will receive only one characters event for each tag. Use a buffer as described in other answers.

For example, hello&world will generate a sequence

  • Startelement
  • hello characters
  • startEntity
  • characters &
  • endEntity
  • a symbol of peace

Check out the Auxialiary SAX interface if you want some more examples. Other special events are external objects, comments, CDATA, etc.

+5
source share

How do you pass the entrance to SAX? Like InputStream (recommended) or Reader? So, starting with your byte [], try using ByteArrayInputStream .

+1
source share

All Articles