Is XMLReader a SAX parser, a DOM parser or not?

I am testing various methods for reading (possibly large, with very often read) XML configuration files in PHP. No writing is ever required. I have two successful implementations, one of which uses SimpleXML (which, as I know, is a DOM parser), and one using XMLReader .

I know that the DOM reader must read the whole tree and therefore uses more memory. My tests reflect this. I also know that the SAX parser is an event-based parser that uses less memory because it reads each node from the stream without checking what comes next.

The XMLReader also reads from the stream with the cursor, which provides information about the node it is currently in. So it definitely sounds like XMLReader ( http://us2.php.net/xmlreader ) is not a DOM parser, but, to my question, is it a SAX parser or something else? It seems that the XMLReader behaves the way the SAX parser does, but does not throw the events themselves (in other words, can you build a SAX parser with XMLReader?)

If this is something else, does the classification have its name?

+3
source share
4 answers

XMLReader calls itself a pull parser.

The XMLReader extension is an XML Pull parser. The reader acts like a cursor going forward along the flow of a document and stopping at every node in the path.

It will be said later that it uses libxml .

This Java Pull Parsing page may be of interest. If XMLReader is related to the goals and objectives of this project, then the answer to your question falls directly into the category of "neither".

+5
source

The SAX parser is a parser that implements the SAX API. That is: this analyzer is a SAX parser if and only if you can use the SAX API code for it. The same goes for the DOM analyzer: this classification applies only to the API that it supports, and not how this API is implemented. Thus, the SAX parser may well be a DOM parser; and therefore, you cannot be sure that you are using less memory or other features.

However, in order to get to the real question: XMLReader seems to be the best choice, because since it is a pull parser, you are asking for the data you need, and you should have less overhead.

+4
source

XMLReader is the interface that the SAX2 parser should implement. Thus, you can say that you have a SAX parser when accessing it through XMLReader and, in short, XMLReader is a SAX parser.

See javadoc XMLReader .

XMLReader is the interface that the SAX2 driver for the XML parser must implement. This interface allows the application to set and query functions and properties in the parser, register event handlers for processing documents, and initiate document analysis.

I think this information is relevant because:

  • It is used on the SAX official website.
  • Even if javadoc is for Java, SAX originated in Java.
+1
source

In short, this is not the case.

SAX parsers are event-based thread-oriented parsers. You register callback functions to handle events such as startElement and endElement, then call parse () to process the entire XML document, at a time, node. To my knowledge, PHP does not have a well-kept SAX parser. However, XMLParser , which uses a very similar Expat .

DOM parsers require you to load an entire XML document into memory, but they provide an object-oriented tree of XML nodes. Examples of DOM parsers in PHP include SimpleXML and DOM .

PHP XMLReader is neither one nor the other. This is a stream-oriented stream parser that requires you to create a large loop and call the read () function to move the cursor forward, processing one node at a time.

The great advantage of XMLParser and XMLReader vs SimpleXML and DOM is that thread-oriented parsers work efficiently with memory by loading only the current node into memory. SimpleXML and the DOM, on the other hand, are easier to use, but they need to load the entire XML document into memory, and this is bad for very large XML documents.

+1
source

All Articles