Best way to manage a lengthy PHP script?

Question

Best way to manage a lengthy PHP script?

I have a PHP script that takes a lot of time (5-30 minutes). Just in case, this is important, the script uses curl to clear data from another server. It is for this reason that it takes so long; he must wait for each page to load before processing it and move on to the next.

I want to be able to initiate a script, and let it be, until it is done, that sets a flag in the database table.

I need to know how to complete an HTTP request before the script completes. Also, is this the best way to do this PHP script?

+68

php curl httprequest apache

kbanman Feb 06 '10 at 9:16

source share

16 answers

A quick and dirty way is to use the ignore_user_abort function in php. It basically says: “It doesn't matter what the user does, run this script until it is complete. It is somewhat dangerous if it is a public site (because it is possible that you will end up with 20 ++ versions of the script running at the same time if it is initiated 20 times).

A “clean” way (at least IMHO) is to set a flag (for example, in db) when you want to initiate a process and run a cronjob every hour (or so) to check if this flag is set. If it is installed, a lengthy script is run, if it is NOT installed, nothing happens.

+11

FlorianH Feb 06 '10 at 9:26 a.m.

source share

You can use exec or system to start a background job, and then do the job.

In addition, there are more effective approaches to cleaning the network you are using. You can use the threading approach (multiple threads executing one page at a time), or one using eventloop (one thread doing several pages at a time). My personal approach using Perl will be using AnyEvent :: HTTP .

ETA: symcbean explained how to properly separate the background process here .

+8

Leon Timmermans Feb 06 2018-10-06T00

source share

No, PHP is not the best solution.

I'm not sure about Ruby or Perl, but with Python you can rewrite your page scraper to multithreading, and it will probably work at least 20 times faster. Writing multi-threaded applications can be a bit complicated, but the very first Python application I wrote was a muttti page scrambler. And you can just call the Python script from your PHP page using one of the shell execution functions.

+5

jamieb Feb 06 2018-10-06T00

source share

Yes, you can do it in PHP. But in addition to PHP, it would be wise to use a queue manager. Here's the strategy:

Divide your larger task into smaller tasks. In your case, each task can load one page.
Queue each small task.
Launch your jobs in line.

Using this strategy has the following advantages:

For long-term tasks, he has the opportunity to recover in case of a fatal problem in the middle of the run - there is no need to start from the very beginning.
If your tasks should not be run sequentially, you can run several workers at the same time to run tasks.

You have many options (these are just a few):

RabbitMQ ( https://www.rabbitmq.com/tutorials/tutorial-one-php.html )
ZeroMQ ( http://zeromq.org/bindings:php )
If you use the Laravel structure, the queues are built-in ( https://laravel.com/docs/5.4/queues ), with drivers for AWS SES, Redis, Beanstalkd

+4

aljo f May 23 '17 at 5:06 a.m.

source share

PHP may or may not be the best tool, but you know how to use it, and the rest of your application is written using it. These two qualities, combined with the fact that PHP is "good enough", make a pretty convincing argument in favor of using it instead of Perl, Ruby or Python.

If your goal is to learn another language, select it and use it. Any language you mention will do the job without any problems. I like Perl, but what you like may be different.

Symcbean has some good tips on how to manage background processes through its link.

In short, write a CLI PHP script to handle long bits. Make sure that he is somehow reporting the status. Make a php page to handle state updates using AJAX or traditional methods. Your startup script will start the process running in its own session and return a confirmation that the process is running.

Good luck.

+3

daotoad Feb 08 '10 at 6:28

source share

I agree with the answers that say this should be done in the background. But it’s also important that you report the status so that the user knows that the work is in progress.

When you receive a PHP request to start the process, you can save the task view with a unique identifier in the database. Then start the screen cleaning process by passing it a unique identifier. Tell the iPhone app that the task has been started and that it must check the specified URL containing the new task identifier in order to get the latest status. The iPhone app can now poll (or even "long poll") this URL. In the meantime, the background process will update the view of the task database as it worked with the completion percentage, current step, or any other status indicators. And when it ends, it will set the completed flag.

+1

Jacob Feb 06 '10 at 19:58

source share

You can send it as an XHR request (Ajax). Clients usually do not have a timeout for XHR, unlike regular HTTP requests.

+1

JAL Feb 06 '10 at 23:51

source share

I understand that this is a rather old question, but would like to give him a chance. This script tries to access the initial start call to finish quickly and chop off a large load into smaller pieces. I have not tested this solution.

 <?php /** * crawler.php located at http://mysite.com/crawler.php */ // Make sure this script will keep on runing after we close the connection with // it. ignore_user_abort(TRUE); function get_remote_sources_to_crawl() { // Do a database or a log file query here. $query_result = array ( 1 => 'http://exemple.com', 2 => 'http://exemple1.com', 3 => 'http://exemple2.com', 4 => 'http://exemple3.com', // ... and so on. ); // Returns the first one on the list. foreach ($query_result as $id => $url) { return $url; } return FALSE; } function update_remote_sources_to_crawl($id) { // Update my database or log file list so the $id record wont show up // on my next call to get_remote_sources_to_crawl() } $crawling_source = get_remote_sources_to_crawl(); if ($crawling_source) { // Run your scraping code on $crawling_source here. if ($your_scraping_has_finished) { // Update you database or log file. update_remote_sources_to_crawl($id); $ctx = stream_context_create(array( 'http' => array( // I am not quite sure but I reckon the timeout set here actually // starts rolling after the connection to the remote server is made // limiting only how long the downloading of the remote content should take. // So as we are only interested to trigger this script again, 5 seconds // should be plenty of time. 'timeout' => 5, ) )); // Open a new connection to this script and close it after 5 seconds in. file_get_contents('http://' . $_SERVER['HTTP_HOST'] . '/crawler.php', FALSE, $ctx); print 'The cronjob kick off has been initiated.'; } } else { print 'Yay! The whole thing is done.'; }

+1

Francisco Luz Jun 27 '13 at 1:24

source share

I would like to offer a solution that is slightly different from symcbean, mainly because I have an additional requirement that the lengthy process needs to be run as a different user, and not as an apache / www-data user.

The first solution using cron to poll the background task table:

The PHP web page is inserted into the background task table, specify "SUBMITTED"
cron is run once every 3 minutes using another user, running a PHP CLI script that checks the background task table for the "SUBMITTED" lines
PHP CLI will update the status column in the row to "PROCESSING" and begin processing, after completion it will be updated to "COMPLETED"

Second solution using Linux initialization tool:

The PHP web page updates the control file with the parameters set by the user, and also provides a task identifier
a shell script (as a non-user) launched by inotifywait will wait for a control file to be written
after the control file is written, the close_write event will be raised, the shell script will continue
shell script executes PHP CLI to execute a long process
The PHP CLI writes the output to a log file identified by a task identifier, or, as an alternative, updates progress in a status table
A PHP webpage could poll a log file (based on a task id) to show the progress of a long process, or it might also request a status table

In my post you can find more information: http://inventorsparadox.blogspot.co.id/2016/01/long-running-process-in-linux-using-php.html

+1

YudhiWidyatama Jan 31 '16 at 12:32

source share

I did similar things with Perl, double fork () and detachment from the parent process. All HTTP configuration work must be done in a forked process.

0

Alexandr Ciornii Feb 06 '10 at 19:41

source share

Use a proxy to delegate the request.

0

zerodin Oct. 29 '10 at 22:17

source share

what I ALWAYS use is one of these options (since different Linux variants have different rules for processing output / some programs are displayed differently):

Option I @exec ('./myscript.php\1> / dev / null \ 2> / dev / null &');

Option II @exec ('php -f myscript.php \ 1> / dev / null \ 2> / dev / null &');

Option III @exec ('nohup myscript.php \ 1> / dev / null \ 2> / dev / null &');

You may have a "nohup" setting. But, for example, when I automated the FFMPEG video conversion, the output interface was somehow not 100% processed by redirecting the output streams 1 and 2, so I used nohup AND redirected the output.

0

dr burns Sep 07 2018-11-11T00:

source share

if you have a long script, then split the page using the input parameter for each task. (then each page acts like a stream) if the page has 1 lac product_keywords long process cycle, then instead of a cycle, make logic for one keyword and pass this keyword out of magic or cornjobpage.php (in the following example)

and for the background worker, I think you should try this technique, it will help to name as many pages as you like, all pages will start immediately independently, without expecting each response to the page to be asynchronous.

cornjobpage.php // mainpage

  <?php post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue"); //post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2"); //post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue"); //call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous. ?> <?php /* * Executes a PHP page asynchronously so the current page does not have to wait for it to finish running. * */ function post_async($url,$params) { $post_string = $params; $parts=parse_url($url); $fp = fsockopen($parts['host'], isset($parts['port'])?$parts['port']:80, $errno, $errstr, 30); $out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like $out.= "Host: ".$parts['host']."\r\n"; $out.= "Content-Type: application/x-www-form-urlencoded\r\n"; $out.= "Content-Length: ".strlen($post_string)."\r\n"; $out.= "Connection: Close\r\n\r\n"; fwrite($fp, $out); fclose($fp); } ?>

testpage.php

  <? echo $_REQUEST["Keywordname"];//case1 Output > testValue ?>

PS: if you want to send the url parameters in a loop, do the following answer: https://stackoverflow.com/a/416829/

0

Hassan Saeed Dec 19 '16 at 15:32

source share

Not the best approach, as many have stated here, but this may help:

 ignore_user_abort(1); // run script in background even if user closes browser set_time_limit(1800); // run it for 30 minutes // Long running script here

0

Lucas Bustamante Jan 29 '19 at 19:27

source share

I cannot comment, but noticed that the Francisco solution will always return only the first URL, since return $ url; will exit the function immediately.

0

Artx Apr 13 '19 at 17:23

source share

symcbean · Accepted Answer · 2010-02-06 23:35

Of course, this can be done using PHP, but you should NOT do this as a background task - the new process must be removed from the process group in which it is running.

As people continue to give the same incorrect answer to this FAQ, I wrote a more complete answer here:

http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html

From the comments:

Short version of shell_exec('echo /usr/bin/php -q longThing.php | at now'); but the reasons why there is little time for inclusion.

Best way to manage a lengthy PHP script?

More articles: