How to get Matlab to read the correct number of xml nodes

I am reading a simple XML file using the matlab xmlread internal function.

<root> <ref> <requestor>John Doe</requestor> <project>X</project> </ref> </root> 

But when I call getChildren () on the ref element, it tells me that it has 5 .

It works great IF . I put all the XML in an ONE string. Matlab tells me that the ref element has 2 children.

It doesn't seem like spaces between elements.

Even if I run Canonicalize in the oXygen XML editor, I still get the same results. Because Canonicalize still leaves blanks.

Matlab uses java and xerces for xml material.

Question:

What can I do to save the xml file in a readable format (not all on one line), but still Matlab parsed it correctly?

Code Update:

 filename='example01.xml'; docNode = xmlread(filename); rootNode = docNode.getDocumentElement; entries = rootNode.getChildNodes; nEnt = entries.getLength 
+8
xml-parsing matlab
source share
2 answers

An XML parser behind the scenes creates #text nodes for all spaces between node elements. Whereever has a new line or indent, this will create a #text node with a new line and the following indent spaces in the node data part. So, in the xml example that you specified when parsing the child nodes of the ref element, it returns 5 nodes

  • Node 1: # text with newlines and indents
  • Node 2: a "requestor" node, which in turn has a #text password with "John Doe" in the data part
  • Node 3: # text with new line and indent fields
  • Node 4: a β€œproject” node, which in turn has a #text child with an β€œX” in the data part
  • Node 5: # text with new line and indent fields

This function removes all these useless #text nodes for you. Please note: if you intentionally have an xml element consisting of nothing but a space, this function will delete it, but for 99.99% of cases xml this should work fine.

 function removeIndentNodes( childNodes ) numNodes = childNodes.getLength; remList = []; for i = numNodes:-1:1 theChild = childNodes.item(i-1); if (theChild.hasChildNodes) removeIndentNodes(theChild.getChildNodes); else if ( theChild.getNodeType == theChild.TEXT_NODE && ... ~isempty(char(theChild.getData())) && ... all(isspace(char(theChild.getData())))) remList(end+1) = i-1; % java indexing end end end for i = 1:length(remList) childNodes.removeChild(childNodes.item(remList(i))); end end 

Call it like this:

 tree = xmlread( xmlfile ); removeIndentNodes( tree.getChildNodes ); 
+10
source share

I felt @cholland's answer was good, but I didn't like the extra xml work. So, here is a solution to remove spaces from a copy of an xml file, which is the main cause of unwanted elements.

 fid = fopen('tmpCopy.xml','wt'); str = regexprep(fileread(filename),'[\n\r]+',' '); str = regexprep(str,'>[\s]*<','><'); fprintf(fid,'%s', str); fclose(fid); 
+1
source share

All Articles