I have an application in which I need to parse or label XML and save the source text (for example, do not parse entities, do not convert spaces to attributes, keep the order of the attributes, etc.). in a Java program.
Today I spent several hours trying to use StAX, SAX, XSLT, TagSoup, etc., before realizing that none of them are doing this. I can not afford to spend much more time attacking this problem, and parsing the text manually seems quite non-trivial. Is there any Java library that can help me market XML?
edit: why am i doing this? - I have a large XML file that I want to make a small number of localized changes programmatically that need to be reviewed. It is very useful to use the diff tool. If the parser / filter normalizes the XML, then all I see is the red ink in the diff tool. An application that creates XML in the first place is not something that I can easily modify to create “canonical XML” if there is such a thing.
source
share