I'm trying to use @XmlAnyElement with DomHandler to capture unpacked text in a specific field, for example, in this example from Blaise Doughan. But when I try to parse multiple clients, the contents of the biofield from all previous records are still sent to my DomHandler!
Here is an example of a document I'm trying to parse:
<?xml version="1.0" encoding="UTF-8"?> <customers> <customer> <name>Jane Doe</name> <bio> <html>Jane bio</html> </bio> </customer> <customer> <name>John Doe</name> <bio> <html>John bio</html> </bio> </customer> </customers>
But the way out:
Name: Jane Doe Bio: <html>Jane bio</html> Name: John Doe Bio: <html>Jane bio</html>
BioHandler (unchanged previous example )
package blog.domhandler; import java.io.StringReader; import java.io.StringWriter; import javax.xml.bind.ValidationEventHandler; import javax.xml.bind.annotation.DomHandler; import javax.xml.transform.Source; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; public class BioHandler implements DomHandler<String, StreamResult> { private static final String BIO_START_TAG = "<bio>"; private static final String BIO_END_TAG = "</bio>"; private StringWriter xmlWriter = new StringWriter(); public StreamResult createUnmarshaller(ValidationEventHandler errorHandler) { return new StreamResult(xmlWriter); } public String getElement(StreamResult rt) { String xml = rt.getWriter().toString(); int beginIndex = xml.indexOf(BIO_START_TAG) + BIO_START_TAG.length(); int endIndex = xml.indexOf(BIO_END_TAG); return xml.substring(beginIndex, endIndex); } public Source marshal(String n, ValidationEventHandler errorHandler) { try { String xml = BIO_START_TAG + n.trim() + BIO_END_TAG; StringReader xmlReader = new StringReader(xml); return new StreamSource(xmlReader); } catch(Exception e) { throw new RuntimeException(e); } } }
Client (unchanged previous example )
package blog.domhandler; import javax.xml.bind.annotation.XmlAnyElement; import javax.xml.bind.annotation.XmlRootElement; import javax.xml.bind.annotation.XmlType; @XmlRootElement @XmlType(propOrder={"name", "bio"}) public class Customer { private String name; private String bio; public String getName() { return name; } public void setName(String name) { this.name = name; } @XmlAnyElement(BioHandler.class) public String getBio() { return bio; } public void setBio(String bio) { this.bio = bio; } }
Customers
package blog.domhandler; import java.util.List; import javax.xml.bind.annotation.XmlAnyElement; import javax.xml.bind.annotation.XmlRootElement; import javax.xml.bind.annotation.XmlType; @XmlRootElement public class Customers { private List<Customer> customers; public List<Customer> getCustomer() { return customers; } public void setCustomer(List<Customer> c) { this.customers = c; } }
Demo (driver)
package blog.domhandler; import java.io.File; import javax.xml.bind.JAXBContext; import javax.xml.bind.Marshaller; import javax.xml.bind.Unmarshaller; public class Demo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance(Customers.class); Unmarshaller unmarshaller = jc.createUnmarshaller(); Customers customers = (Customers) unmarshaller.unmarshal(new File("src/blog/domhandler/input.xml")); for( Customer customer: customers.getCustomer() ) { System.out.println("Name: " + customer.getName()); System.out.println("Bio: " + customer.getBio()); } } }
When I put a breakpoint in BioHandler.getElement (), I see that the first time its name String xml takes on
<?xml version="1.0" encoding="UTF-8"?><bio><html>Jane bio</html> </bio>
and the second time it is called String xml, it takes a value
<?xml version="1.0" encoding="UTF-8"?><bio><html>Jane bio</html> </bio><?xml version="1.0" encoding="UTF-8"?><bio><html>John bio</html> </bio>
Is there a way to tell the parser that this content should be discarded after each call to BioHandler.getElement ()?