How to overcome server load problems while running multiple cron jobs at the same time?

I have a website that displays data from a game server. The game has different "domains" (which are actually only separate servers) on which users play.

I currently have 14 cron jobs running at different hour intervals every 6 hours. All 14 files that run are almost the same, and each takes about 75 minutes (an hour and 15 minutes) to complete it.

I thought about using only one file with cron and a loop through each server, but this will simply cause one file to work for 18 hours or so on. My current VPS is set to allow only 1 vCPU , so I'm trying to perform all the steps and stay within the load on the dedicated server.

Seeing that the site should have updated data every 6 hours , this is not feasible.

I started looking for message queues and passed some information to a background process that would do the job. I started using resque and php-resque , but my background worker died as soon as it was started. So, I moved on to ZeroMQ , which seems to be the one I need more.

I installed ZMQ through Composer, and everything that was during the installation stopped. In my working script (which will be called by cron every 6 hours) I have:

 $dataContext = new ZMQContext(); $dataDispatch = new ZMQSocket($dataContext, ZMQ::SOCKET_PUSH); $dataDispatch->bind("tcp://*:50557"); $dataDispatch->send(0); foreach($filesToUse as $filePath){ $dataDispatch->send($filePath); sleep(1); } $filesToUse = array(); $blockDirs = array_filter(glob('mapBlocks/*'), 'is_dir'); foreach($blockDirs as $k => $blockDir){ $files = glob($rootPath.$blockDir.'/*.json'); $key = array_rand($files); $filesToUse[] = $files[$key]; } $mapContext = new ZMQContext(); $mapDispatch = new ZMQSocket($mapContext, ZMQ::SOCKET_PUSH); $mapDispatch->bind("tcp://*:50558"); $mapDispatch->send(0); foreach($filesToUse as $blockPath){ $mapDispatch->send($blockPath); sleep(1); } 

$filesToUse is an array of files submitted by users that contain information that will be used when requesting a game server. As you can see, I ZeroMQ over the array and send each file to a ZeroMQ listener file that contains:

 $startTime = time(); $context = new ZMQContext(); $receiver = new ZMQSocket($context, ZMQ::SOCKET_PULL); $receiver->connect("tcp://*:50557"); $sender = new ZMQSocket($context, ZMQ::SOCKET_PUSH); $sender->connect("tcp://*:50559"); while(true){ $file = $receiver->recv(); // -------------------------------------------------- do all work here // ... ~ 75:00 [min] DATA PROCESSING SECTION foreach .recv()-ed WORK-UNIT // ---------------------------------------------------------------------- $endTime = time(); $totalTime = $endTime - $startTime; $sender->send('Processing of domain '.listener::$domain.' competed on '.date('Mj-y', $endTime).' in '.$totalTime.' seconds.'); } 

Then in the final listener file:

 $context = new ZMQContext(); $receiver = new ZMQSocket($context, ZMQ::SOCKET_PULL); $receiver->bind("tcp://*:50559"); while(true){ $log = fopen($rootPath.'logs/sink_'.date('F-jS-Y_h-i-A').'.txt', 'a'); fwrite($log, $receiver->recv()); fclose($log); } 

When a working script starts with cron , I do not get the confirmation text in my log.

Q1) - is this the most efficient way to do what I'm trying to do? Q2) Am I trying to use or implement ZeroMQ here?

And, as it would seem, using cron to call 14 files at the same time causes the load to far exceed the allocation. I know that perhaps I could just set work tasks at different times during the day, but if at all possible, I would like to keep all the updates in the same schedule.


UPDATE:

Since then, I went ahead and upgraded my VPS to 2 processor cores, so the issue loading aspect is not really that important.

The code above has also been changed to the current setting.

I, after updating the code, receive an email from cron with the error:

Fatal error: Uncaught exception 'ZMQSocketException' with message 'Failed to bind the ZMQ: Address already in use'

+5
source share
2 answers

Running your scripts through cron or through ZeroMQ will make absolutely no difference in how much CPU you need. The only difference between the two is that the cron task runs your script at intervals, and the message queue runs your script based on some user actions.

At the end of the day, you need more available threads to run your scripts. But before you go this route, you can take a look at your scripts. Maybe there is a more efficient way to write them so that they do not consume so many resources? And have you looked at CPU utilization? Most web hosts have built-in metrics that you can pull through your console. You may not be using as many resources as you think.

The fact that you need a lot more time to run a file that goes through all the servers than the cumulative time to run the files individually suggests that your scripts are not multi-threaded. One instance of your script does not use all available resources, and thus, you only get a speed increase when you run multiple instances of your scripts.

+9
source

Yes, this method does not seem to be a powerful use of ZeroMQ permissions.

The good news is that you can reconstruct the solution to make it closer to best practices.

MOTIVATION

ZeroMQ cannot be a very powerful and very smart tool for developing, managing, managing, distributed events, distributed distributed distributed systems. There are many resources published in the best technical ZeroMQ system design techniques.

Lightweight does not mean a gold bullet or perpetual motion machine with zero overhead.

ZeroMQ still consumes additional resources and target ecosystems, especially for those who have minimal resources (hidden hyper- vCPU restriction on some vCPU / vCPU-core VPS-system vCPU , as only one vivid example here), you can understand that there is no the benefits and cost adjustments for concurrency threads to consume additional ZeroMQ (1+) I / O threads for each Context() instance.

Exception handling . No, rather, preventing exceptions and preventing blocking is alpha / omega for a production, continuous, distributed processing system. Your experience becomes bitter and bitter, but you will learn a lot about the practice of developing software using ZeroMQ. One such lesson to learn is resource management and graceful termination. Each process is responsible for freeing all allocated resources, so the port blocked by the corresponding .bind() -s should be systematically free and released in a clear way.

(Plus, you will soon realize that a port release is not instantaneous due to operating system overheads that are outside of one control code, so do not rely on such a port, immediately becoming an RTO to reuse the next port (you can find many messages here , on ports blocked in this way)).


Resource Envelope Facts [FIRST]:

While quantitative data on processing Performance / resources Use envelopes are not yet available, an added image can help identify the key importance of such knowledge. p>

vCPU -workload Envelope after markets started the next 24/5 on Sunday 22:00 GMT+0000

enter image description here

However, + 55% CPU-power avail vCPU -workload and other resources-use Envelopes

enter image description here


Cron Queues and Relative Priority for Hacking [NEXT]:

Without detailed information about whether 75-minute WORK-UNIT downtime is related to CPU or I / O issues, system configuration can reduce the relative priorities of cron jobs, so that your system performance is "focused" on the main tasks of the clock peak. It is possible to create a separate queue with the adapted priority nice . A good trick in this was introduced by @Pederabo:

cron usually works with nice 2 , but this is controlled by the queuedefs file. Queues a , b and c for at , batch and cron .
- should be able to add a line for the queue, say, Z , which defines a special queue, and sets the nice value to zero .
- should be able to run the script from the console using at -q Z ...
- if this works well, put the at command in crontab .
The " at " command will run with cron priority by default, but it only takes a few seconds. Then, the created work will be performed with what you set in the queuedefs file for queue Z


Avoid unnecessary overhead [ALWAYS]:

There is always a reason not to waste CPU-clks. Especially in minimalist systems. Using the tcp:// transport class in the same localhost can be a PoC practice during the prototyping phase, but never to go into the production phase 24/7. Try to avoid all services that are never used - why you upgrade to L3, consuming even more operating system resources (ZeroMQ is not Zero-Copy at this phase - therefore double distributions appear here) when delivered only to the same localhost . ipc:// and inproc:// transport classes are much better suited for this modus-operandi (also the link below when distributing trully)


The main problem (processing design using ZeroMQ tools)

based on this high-level description of what is intent, there seems to be a way to avoid the cron mechanism and allow the entire pipeline / distribution / collection process to become a continuous distributed ZeroMQ distributed processing system where you can build a standalone CLI ( r/KBD terminal for ad- hoc with continuous processing system) to:

  • remove one dependency on functions / limitations of the operating system.
  • Reduce overall overhead associated with simultaneously servicing a process at the system level.
  • exchange one, central, Context() (thus, paying the minimum cost of just one additional input / output stream), because the processing does not seem to be sensitive to messaging / sensitive to ultra-low latency.

Your ZeroMQ ecosystem can help you create a scalable, scalable, or even adaptive scaling feature, because scalable distributed processing does not only limit you to your VPS localhost device (if your VPS hyper-thread restrictions do not allow such processed processing to match your 24/7 WORK-UNIT -s stream -productivity).

Everything that just changes the appropriate transport class from ipc:// to tcp:// allows you to distribute tasks ( WORK-UNIT -s) literally around the globe for any node processing, you can "plug-in" to increase your computing power. all without SLOC source code changes.

enter image description here

It’s worth reviewing your design strategy once, right?

+4
source

All Articles