Parsing XML strings in MATLAB

I need to parse an XML string using MATLAB (caution: without input / output of files, so I donโ€™t want to write the string to a file and then read it). I get strings from an HTTP connection and the parsing should be very fast. I am most worried about reading the values โ€‹โ€‹of certain tags throughout the line

The network is full of death threats related to parsing XML using regular expression, so I did not want to do this. I know that MATLAB has built-in java integration, but I'm not very good at java. Is there a quick way to get certain values โ€‹โ€‹from XML very quickly?

For example, I want to get volume information from this line below and write this to a variable.

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <root> <volume>256</volume> <length>0</length> <time>0</time> <state>stop</state> .... 
+7
source share
3 answers

For what it's worth, below is the Matlab executable Java code to complete the required task without writing an intermediate file:

 %An XML formatted string strXml = [... '<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>' char(10)... '<root>' char(10) ... ' <volume>256</volume>' char(10) ... ' <length>0</length>' char(10) ... ' <time>0</time>' char(10) ... ' <state>stop</state>' char(10) ... '</root>' ]; %"simple" java code to create a document from said string xmlDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder.parse(java.io.StringBufferInputStream(strXml)); %"intuitive" methods to explore the xmlDocument nodeList = xmlDocument.getElementsByTagName('volume'); numberOfNodes = nodeList.getLength(); firstNode = nodeList.item(0); firstNodeContent = firstNode.getTextContent; disp(firstNodeContent); %Returns '256' 

Alternatively, if your application allows this, consider passing the URL directly to your XML parser. Invalid Java code below, but probably also opens the Matlab xslt built-in function.

 xmlDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder.parse('URL_AS_A_STRING_HERE'); 

The documentation is here . Start with the javax.xml.parsers package.

+7
source

There is a whole class of functions for working with xml, including xmlread and xmlwrite . This should be very helpful for your problem.

+1
source

I am not familiar with Matlab APIs at all, but I would like to point out that using the DOM method described by Pursuit will take more time / memory if you want only certain values โ€‹โ€‹to be output from the XML stream that you are returning an HTTP connection .

While STAX will give you a quick approach to parsing in Java, using the API can be cumbersome, especially if you are not familiar with Java. You can use SJXP , which is an extremely subtle abstraction based on STAX analysis in Java (disclaimer: I'm the author), which allows you to define the paths to the elements you want, then you give the parser a stream (your HTTP stream in this case), and it pulls out all the values โ€‹โ€‹for you.

As an example, suppose you want the values โ€‹โ€‹/ root / state and / root / volume from the XML examples shown, the actual Java would look something like this:

 // Create /root/state rule IRule stateRule = new DefaultRule(Type.CHARACTER, "/root/state") { @Override public void handleParsedCharacters(XMLParser parser, String text, Object userObject) { System.out.println("State is: " + text); } } // Create /root/volume rule IRule volRule = new DefaultRule(Type.CHARACTER, "/state/volume") { @Override public void handleParsedCharacters(XMLParser parser, String text, Object userObject) { System.out.println("Volume is: " + text); } } // Create the parser with the given rules XMLParser parser = new XMLParser(stateRule, volRule); 

You can do all this initialization at program startup, and then when you process the stream from your HTTP connection, you will do something like:

 parser.parser(httpConnection.getOutputStream()); 

or the like; then all the code of your handler that you defined in your rules will be called because the parser is launched through the stream of characters from the HTTP connection.

As I said, I am not familiar with Matlab and I donโ€™t know the correct ways to "Matlab-i-fy" this code, but it seems from the first example you can more or less simply use the Java API directly in this case this solution will be faster and there will be significantly less memory for parsing, if important, than the DOM approach.

+1
source

All Articles