Removing invalid characters from String while parsing XML in Java

I walked and read SO, but nothing worked. I have a problem with characters in an XML feed. I save the value of each tag in String, but when 
 happens, he just stops. I get only the 4-5 first words in a tag or so.

So can someone help me with a method that can remove it? Or maybe the text in the tags in the XML feed is too long for the string?

Thanks!

Code example:

  public void characters(char[] ch, int start, int length) throws SAXException { if (currentElement) { currentValue = new String(ch, start, length); currentElement = false; } } public void endElement(String uri, String localName, String qName) throws SAXException { currentElement = false; /** set value */ if (localName.equalsIgnoreCase("title")) sitesList.setTitle(currentValue); else if (localName.equalsIgnoreCase("id")) sitesList.setId(currentValue); else if(localName.equalsIgnoreCase("description")) sitesList.setDescription(currentValue); } 

The text in the description tag is quite long, but I only get the first five words before the 
 .

+4
source share
1 answer

You are using SAXparser to parse the XML-String.

The characters() method can be called several times when reading only one XML element. This happens when it finds something like <desc>blabla bla & # 39; bla bla la.</desc> <desc>blabla bla & # 39; bla bla la.</desc> .

The solution is to use a StringBuilder and add the read characters to the characters() method, and then reset t22> in the endElement() method:

 private class Handler extends DefaultHandler{ private StringBuilder temp_val; public Handler(){ this.temp_val = new StringBuilder(); } public void characters(char[] ch, int start, int length){ temp_val.append(ch, start, length); } public void endElement(String uri, String localName, String qName){ System.out.println("Output: "+temp_val.toString()); // ... Do your stuff temp_val.setLength(0); // Reset the StringBuilder } } 

The above code works for me, given this XML file:

 <?xml version="1.0" encoding="iso-8859-1" ?> <test>This is some &#13; example-text.</test> 

Output:

Exit: this is a few example text.

+1
source

All Articles