How to ignore spaces when reading a file to create XML DOM

I am trying to read a file to create a DOM document, but the file has spaces and newlines, and I try to ignore them, but I could not:

DocumentBuilderFactory docfactory=DocumentBuilderFactory.newInstance(); docfactory.setIgnoringElementContentWhitespace(true); 

I see in Javadoc that setIgnoringElementContentWhitespace only works when the check flag is on, but I don't have a DTD or XML schema for the document.

What can I do?

Update

I don't like the idea of ​​introducing mySelf <! ELEMENT ... and I tried the solution proposed in the forum indicated by Tomalak, but it does not work, I used java 1.6 in linux environment. I think that if it is no longer offered, I will make several methods to ignore text text nodes

+6
java xml whitespace
source share
4 answers

'IgnoringElementContentWhitespace is not deleting all text nodes with a clean space, but only node nodes whose parents are described in the scheme with the contents of ELEMENT, that is, they contain only other elements and never texts.

If you do not have the schema used (DTD or XSD), the content of the element defaults to MIXED, so this parameter will never have any effect. (If the analyzer does not provide a non-standard DOM extension for processing all unknown elements containing ELEMENT content, which, as far as I know, are available for Java, no.)

You can hack a document along the way to the parser to include information about the schema, for example by adding an internal subset to <! DOCTYPE ... [...]> containing <! ELEMENT ...>, then use the IgnoringElementContentWhitespace parameter.

Or perhaps simpler, you can simply remove the space nodes either in the post process or when using LSParserFilter.

+9
source share

This is a (really) late answer, but here is how I solved it. I wrote my own implementation of the NodeList class. It simply ignores text nodes that are empty. Code follows:

 private static class NdLst implements NodeList, Iterable<Node> { private List<Node> nodes; public NdLst(NodeList list) { nodes = new ArrayList<Node>(); for (int i = 0; i < list.getLength(); i++) { if (!isWhitespaceNode(list.item(i))) { nodes.add(list.item(i)); } } } @Override public Node item(int index) { return nodes.get(index); } @Override public int getLength() { return nodes.size(); } private static boolean isWhitespaceNode(Node n) { if (n.getNodeType() == Node.TEXT_NODE) { String val = n.getNodeValue(); return val.trim().length() == 0; } else { return false; } } @Override public Iterator<Node> iterator() { return nodes.iterator(); } } 

You then complete your entire NodeList in this class, and it will effectively ignore all whitespace nodes. (Which I define as text nodes with truncated text of length 0.)

It also has the added benefit that it can be used in a for-each loop.

+5
source share

I did it by doing it

 DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); dbFactory.setIgnoringElementContentWhitespace(true); dbFactory.setSchema(schema); dbFactory.setNamespaceAware(true); NodeList nodeList = element.getElementsByTagNameNS("*", "associate"); 
+2
source share

Try the following:

 private static Document prepareXML(String param) throws ParserConfigurationException, SAXException, IOException { param = param.replaceAll(">\\s+<", "><").trim(); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setIgnoringElementContentWhitespace(true); DocumentBuilder builder = factory.newDocumentBuilder(); InputSource in = new InputSource(new StringReader(param)); return builder.parse(in); } 
0
source share

All Articles