** This is primarily aimed at those that start with XML parsing and are not sure which parser to use.
There are two βgreatβ ways to parse: you can load XML into memory and find what you need (DOM, SimpleXML), or you can pass it in - read and execute code based on what you read (XMLReader, SAX).
According to Microsoft, SAX is a parser that sends every information to your application, and your application processes it. SimpleXML is a parser that allows you to skip pieces of data and get only what you need. According to Microsoft, this can simplify and speed up your application, and I would suggest that the implementation of .NET and PHP is similar. I assume that your choice will depend on your needs - if you pull out only a few tags from a larger fragment and can use $xml->next('Element') to skip significant fragments, you may find that XMLReader is faster than SAX.
By repeating the parsing of "small" (<30kb, 700 lines) XML files, you cannot expect that there will be a huge time difference between the analysis methods. I was surprised to find that it was. I compared a small feed processed in SimpleXML and XMLReader. Hope this helps someone else understand how important this data is. To compare real life, this is a parsing of the response to two channels of requesting information about Amazon MWS products.
Each time analysis time is the time required to accept 2 XML lines and return about 120 variables containing values ββfrom each line. Each cycle receives different data, but each of the tests was on the same data in the same order.
SimpleXML loads the document into memory. I used microtime to check the time needed to complete the analysis (extract the appropriate values), as well as the time taken to create the element (when new SimpleXMLElement($xml) was called). I rounded them to 4 decimal places.
Parse Time: 0.5866 seconds Parse Time: 0.3045 seconds Parse Time: 0.1037 seconds Parse Time: 0.0151 seconds Parse Time: 0.0282 seconds Parse Time: 0.0622 seconds Parse Time: 0.7756 seconds Parse Time: 0.2439 seconds Parse Time: 0.0806 seconds Parse Time: 0.0696 seconds Parse Time: 0.0218 seconds Parse Time: 0.0542 seconds __________________________ 2.3500 seconds 0.1958 seconds average Time Spent Making the Elements: 0.5232 seconds Time Spent Making the Elements: 0.2974 seconds Time Spent Making the Elements: 0.0980 seconds Time Spent Making the Elements: 0.0097 seconds Time Spent Making the Elements: 0.0231 seconds Time Spent Making the Elements: 0.0091 seconds Time Spent Making the Elements: 0.7190 seconds Time Spent Making the Elements: 0.2410 seconds Time Spent Making the Elements: 0.0765 seconds Time Spent Making the Elements: 0.0637 seconds Time Spent Making the Elements: 0.0081 seconds Time Spent Making the Elements: 0.0507 seconds ______________________________________________ 2.1195 seconds 0.1766 seconds average over 90% of the total time is spent loading elements into the DOM. Only 0.2305 seconds is spent locating the elements and returning them.
While stream-based XMLReader, I was able to skip a significant portion of one of the XML feeds, because the data I needed was close to the top of each element. "Your mileage may vary."
Parse Time: 0.1059 seconds Parse Time: 0.0169 seconds Parse Time: 0.0214 seconds Parse Time: 0.0665 seconds Parse Time: 0.0255 seconds Parse Time: 0.0241 seconds Parse Time: 0.0234 seconds Parse Time: 0.0225 seconds Parse Time: 0.0183 seconds Parse Time: 0.0202 seconds Parse Time: 0.0245 seconds Parse Time: 0.0205 seconds __________________________ 0.3897 seconds 0.0325 seconds average
What is striking is that although the layout of elements in SimpleXML is slightly faster, once it is loaded, it is actually more than 6 times faster to use XMLReader .
You can find some information about using XMLReader in How to use XMLReader in PHP?