How can I ensure the maximum number of bifurcated children?

Question

How can I ensure the maximum number of bifurcated children?

EDIT: I noted this C in the hope of getting more response. This is more of a theory interest than a specific language implementation. Therefore, if you are a C-encoder, please consider the following PHP as pseudo-code and feel free to respond with a response written in C.

I am trying to speed up the CLI script by running tasks in parallel, rather than sequentially. Tasks are completely independent of each other, so it does not matter in which order they begin / end.

Here's the original script (note that all of these examples have been removed for clarity):

<?php $items = range(0, 100); function do_stuff_with($item) { echo "$item\n"; } foreach ($items as $item) { do_stuff_with($item); }

I managed to get it to work with $items in parallel with pcntl_fork() , as shown below:

 <?php ini_set('max_execution_time', 0); ini_set('max_input_time', 0); set_time_limit(0); $items = range(0, 100); function do_stuff_with($item) { echo "$item\n"; } $pids = array(); foreach ($items as $item) { $pid = pcntl_fork(); if ($pid == -1) { die("couldn't fork()"); } elseif ($pid > 0) { // parent $pids[] = $pid; } else { // child do_stuff_with($item); exit(0); } } foreach ($pids as $pid) { pcntl_waitpid($pid, $status); }

Now I want to expand this, so that the maximum, say, will immediately be 10 children. What is the best way to handle this? I tried a few things but didn't have much luck.

+3

c php fork process-management

skix Dec 03 '08 at 4:54

source share

4 answers

The best thing I can think of is to add all the tasks to the queue, start the maximum number of threads that you want, and then each thread requesting a task from the queue, execute the task and request the next one.Do not forget that the threads stop when there are no more tasks.

+2

tomjen Dec 03 '08 at 5:51

source share

Viking is an expensive operation. In appearance, what you really want is multi threading , not multi processing . The difference is that threads are much lighter than processes, because threads share a virtual address space, but processes have separate virtual address spaces.

I am not a PHP developer, but a quick Google search shows that PHP does not support multithreading, but there are libraries to do this work.

In any case, once you figure out how to create threads, you should find out how many threads will be created. To do this, you need to know what is the bottleneck of your application. Is CPU, memory, or I / O bottleneck? You indicated in your comments that you are attached to a network, and the network is an I / O type.

If you were associated with a processor, you only get parallelism, since you have processor cores; more threads, and you just spend time doing context switches. Assuming that you can figure out how many complete threads will be created, you should divide your work into many units and each thread processes one block independently.

If you were connected with memory, multithreading would not help.

Since you are involved with I / O, figuring out how many threads to spawn is a bit more complicated. If all work items take about the same time to process with very low dispersion, you can estimate how many threads will be created by measuring how much time one work item takes. However, since network packets tend to have very varying latencies, this is unlikely to occur.

One option is to use thread pools — you create an entire thread chain, and then for each item you process, you see if there is a free thread in the pool. If there is, you have this thread doing the work, and you move on to the next element. Otherwise, you wait until the stream becomes available. Choosing a thread pool size is important - too large, and you are wasting time on unnecessary context switches. Too few and you too often wait for threads.

Another option is to abandon multithreading / multiprocessing and instead do asynchronous I / O. Since you mentioned that you are working on a single-core processor, this is likely to be the fastest option. You can use functions like socket_select() to check if the socket has data. If so, you can read the data, otherwise you will switch to another socket. This requires much more accounting, but you do not expect the data to arrive on one socket, when the data is available on another socket.

If you want to avoid threads and asynchronous I / O and stick to multiprocessing, this can be useful if handling each item is expensive enough. Then you can make such a work unit:

 $my_process_index = 0; $pids = array(); // Fork off $max_procs processes for($i = 0; $i < $max_procs - 1; $i++) { $pid = pcntl_fork(); if($pid == -1) { die("couldn't fork()"); } elseif($pid > 0) { // parent $my_process_index++; $pids[] = $pid } else { // child break; } } // $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes // Each process will now process 1/$max_procs of the items for($i = $my_process_index; $i < length($items); $i += $max_procs) { do_stuff_with($items[$i]); } if($my_process_index != 0) { exit(0); }

+1

Adam rosenfield Dec 03 '08 at 5:57

source share

man 2 setrlimit

This will be for every user, which may be what you want anyway.

0

Dustin Dec 03 '08 at 5:36

source share

qrdl · Accepted Answer · 2008-12-03T08:09:00+0000

There is no syscall to get a list of child pids, but ps can do this for you.

--ppid switch will display all the children for you, so you just need to count the number of lines output by ps .

Alternatively, you can save your own counter, which you will increase by fork() and decrease by SIGCHLD if ppid remains unchanged for the processed fork'ed.

How can I ensure the maximum number of bifurcated children?

More articles: