Speed โ€‹โ€‹up XML parsing with php

Hi, I have an xml file containing about 12,000 entries. I have the code written and it works fine, it just takes some time to parse the XML file and return the content. Is there a way to speed up this process?

My code is:

<?php $dom = new DOMDocument(); $dom->load('comics.xml'); foreach ($dom->getElementsByTagName('record') as $entry) { $title = $entry->getElementsByTagName('title')->item(0)->textContent; echo $title; } ?> 

XML file (just 1 demo where there is no link to all lol):

 <?xml version='1.0' encoding='utf-8'?> <calibredb> <record> <id>1</id> <uuid>991639a0-7cf6-4a34-a863-4aab8ac2921d</uuid> <publisher>Marvel Comics</publisher> <size>6109716</size> <title sort="Iron Man v1 101">Iron Man v1 101</title> <authors sort="Unknown"> <author>Unknown</author> </authors> <timestamp>2012-04-15T18:49:22-07:00</timestamp> <pubdate>2012-04-15T18:49:22-07:00</pubdate> <cover>M:/Comics/Unknown/Iron Man v1 101 (1)/cover.jpg</cover> <formats> <format>M:/Comics/Unknown/Iron Man v1 101 (1)/Iron Man v1 101 - Unknown.zip</format> </formats> </record> </calibredb> 
+4
source share
3 answers

The answer depends on the data. Some possible solutions are to move the data to a relational database, such as MySQL, or to normalize the data in CSV format, which is easier to analyze, takes up less space and can be read in turn.

+1
source
An approach

The DOM is suitable for small data sets because the entire XML structure is parsed and stored in memory.

In your situation, you should use the SAX approach when analyzing large XML files, since the XML file is read in turn, and not all at a time.

There are several examples on Google: https://www.google.lv/search?q=php+SAX+XML

+2
source

I am not familiar with the PHP implementation, but using the following C ++ approach using Xerces, I have seen huge performance improvements for your scenario.

Instead of requesting all the elements by name and waiting for the entire NodeList to be returned, I found it much faster to get the first child node under the root of the node and then get NextSibling node. Using each brother as a new node, you continue to receive NextSibling. until itโ€™s gone.

Hopefully this will provide a performance boost in PHP similar to what it was in C ++.

0
source

All Articles