For illegal characters, I would recommend implementing a Reader filter; just convert them (provided they are control characters) with a space or separate.
Undeclared objects are more complex; some xml parsers allow you to define an alternative DTD to use ( Woodstox , at least. If so, you can enter a DTD that declares the objects you need.
Staxman
source share