CakePHP recommendation for iterating a huge table and creating a sitemap?

I am trying to create an XML sitemap using CakePHP from a table with more than 50,000 entries at the moment, each entry is equivalent to a URI in the Sitemap. Now the problem I am facing is CakePHP, which takes me out of memory when I create it for two reasons:

  • A find('all') creates a huge associative array of the entire set of 50,000 URIs.
  • Since I do not want to output HTML from the controller itself, I transfer the associative array containing the URI, priority, change frequency, etc., to the view with the call to $this->set() - which is huge again, containing 50,000 indexes.

Is it possible to do this at all following the recommendations of MVC and CakePHP?

+6
performance php cakephp sitemap
source share
6 answers

Are you sure you need to break out of 50,000 records? Even if the string is 1K (rather large), will you have to deal with ~ 50 MB of data? My P1 had enough RAM to handle this. Set memory_limit in php.ini higher than the default value. (Also consider setting max_execution_time.)

On the other hand, if you think the dataset is too large and treat it as too resource intensive, you should not use this page dynamically, this is an ideal DDoS bait. (At the very least, I would cache it carefully.) You can schedule a cron job to re-generate the page every X hours on the server side of the script without a fine MVC for simultaneously serving all the data in the view, this can work on lines sequentially.

+2
source share

I know this question is old, but for really huge requests there is still no good solution, I think.

To iterate through a huge set of results, you can use DboSource methods.

First get DBO

 $dbo = $this->Model->getDataSource(); 

Create a request

 $sql = $dbo->buildStatement($options); 

Then execute the statement and iterate the results

 if ($dbo->execute($sql)) { while ($dbo->hasResult() && $row = $dbo->fetchResult()) { // $row is an array with same structure like find('first') } } 
+4
source share

I had a similar issue this week, and came across Containable Behavior. This allows you to reduce any relationship-related queries (if any).

A better solution would be to programmatically use LIMIT and OFFSET and scroll through small fragments of the recordset at a time. This will save you from imposing 50 thousand records in memory immediately.

+3
source share

find ('all') is too greedy, you need to be more specific if you don't want to run out of memory.

As stated above, use the Containable behavior. If you want only the results from your table (without related tables) and only for a few fields, a more explicit query like this should be better:

 $results = $this->YourModel->find('all', array( 'contain' => false, 'fields' => array('YourModel.name', 'YourModel.url') ); 

You should also consider adding the html caching mechanism (cakePHP has a built-in or usable suggested by Matt Curry ).

Of course, this will be a cached version and will not fully correspond to your list. If you want more control, you can always save the result in the cake cache (using Cache :: write ), using your model's afterSave / afterDelete callbacks to update the cached value and recreate the cached xml file here.

+2
source share

Have you tried unBindModel (if you have a relationship) ...

Whenever I have to make huge queries in cakephp, I just use the "normal" mysql functions like mysql_query, mysql_fetch_assoc, etc. Much faster and the lack of lack of memory ...

+1
source share

Use https://github.com/jamiemill/cakephp_find_batch or implement this logic yourself.

+1
source share

All Articles