How to improve PHP XML load time?

Declining my lurker status to finally ask a question ...

I need to know how I can improve the performance of a PHP script that extracts data from XML files.

Some background:

  • I have already matched the bottleneck with the processor, but I want to optimize the performance of the script before taking advantage of the processor. In particular, the most processor-consuming part of the script is XML loading.

  • The reason I use XML to store object data is because the data must be accessible through the Flash browser interface, and we want to provide quick access for users in this area. The project is still in its early stages, so if it were best practice to drop XML altogether, that would be a good answer too.

  • Plenty of data: Currently, building a graph for about 100 thousand objects, although usually small, and ALL of them should be considered in a script, possibly with a few rare exceptions. The data set will only grow over time.

  • Frequent runs: ideally, we will run the script ~ 50k once per hour; realistic, we agreed to run ~ 1k / h. This, combined with data size, makes performance optimization absolutely imperative.

  • An optimization step has already been taken to make several runs on the same data, rather than loading them for each run, but it still takes too much time. Runs should usually use "fresh" data with changes made by users.

+4
source share
3 answers

Just to clarify: Are the data from the XML files loaded for processing in the current state and will they be changed before being sent to the Flash application?

It sounds like you're better off using a database to store your data and popping XML as needed, rather than reading it in XML first; if the creation of the XML files is slowed down, you can cache the files as they are created to avoid redundant generation of the same file.

+3
source

If the XML remains relatively static, you can cache it as a PHP array, something like this:

<xml><foo>bar</foo></xml> 

cached in the file as

 <?php return array('foo' => 'bar'); 

For PHP, you only need to enable the extended version of XML faster.

+1
source

~ 1k / hour, 3600 seconds per hour, more than 3 runs per second (not to mention 50k / hour) ...

There are a lot of questions. Some of them:

  • Does your php script need to read / process all data source records for each individual run? If not, what subset does he need (~ size, criteria, ...)
  • The same question for a flash application +, who sends the data? Php script? Direct request for full static xml file?
  • What operations are performed in the data source?
  • Do you need some kind of concurrency mechanism?
  • ...

And just because you want to deliver xml data to flash clients does not necessarily mean that you need to store XML data on the server. If, for example, clients only need a tiny subset of the available records, it might be much faster not to store the data as xml, but something more suitable for speed and "search", and then create the xml output of the subset "on the fly", maybe , using some caching, depending on what data the client requests and how / how much the data changes.

edit: Suppose you really need a truly complete dataset and need continuous simulation. Then you can think of a continuous process that saves the entire โ€œworld modelโ€ in memory and works with this model for every run (world tick). This way, at least you donโ€™t have to upload data on every tick. But such a process is usually written to something other than php.

0
source

All Articles