I wrote a Perl script that compiles displays so that they can be viewed by the user. There are thousands of these files (DSET files) that need to be compiled, and the process takes a lot of time (4-5 hours). Displays are compiled using an external executable file (I have no information about the internal operation of this executable file).
As a solution to speed up the process, we decided to run several instances of this executable file in parallel, trying to dramatically improve performance.
When working with 16 threads, performance increases significantly, and now it takes about 1 hour, not 4-5, but the problem still remains. As the script progresses, the execution time of this executable file increases.
I checked about 1000 DSET files and tracked the runtime of an external compilation program as the Perl script progressed. Below is a graph of runtime versus time.

As you can see, when you run the script, it takes the Perl script about 4 seconds to open the executable file, compile DSET, and then close the executable file. Once the script has processed about 500 DSETs, the time taken to compile each subsequent DSET increases. By the time the script is nearing the end, some of the DSET files take as long as 12 seconds!
The following is an example of the function that each thread executes:
# Build the displays sub fgbuilder { my ($tmp_ddldir, $outdir, $eset_files, $image_files) = @_;
Each time through the loop, it spawns a new instance of the screen-building executable, waits for it to complete, and then closes that instance (which should free up any memory it used and resolve any problems like the ones I see).
There are 16 of these threads running in parallel, each of which issues a new DSET from the queue, compiles it, and moves the compiled display to the output directory. After the display is compiled, it proceeds to remove another DSET from the queue and restart the process until the queue is exhausted.
I scratch my head all day trying to figure out why it slows down. During the process, my RAM usage is stable and not increasing, and my CPU usage is not where it is almost maximized. Any help or understanding of what is happening here is appreciated.
EDIT
I wrote a test script to try to verify the theory that the problem is caused by a disk I / O caching problem. In this script, I took the same main body of the old script and replaced the call with the executable with my own task.
Here is what I replaced the executable file:
# Convert the file to hex (multiple times so it takes longer! :D) my @hex_lines = (); open my $ascii_fh, '<', $tmp_dset_file; while (my $line = <$ascii_fh>) { my $hex_line = unpack 'H*', $line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; $hex_line = unpack 'H*', $hex_line; push @hex_lines, $hex_line; } close $ascii_fh;
Instead of calling the executable file and compiling DSET, I open each of them as a text file, do some simple processing and write several files to disk (each time I wrote several files to disk, because the executable writes more than one file to disk for each DSET which he processes). Then I controlled the processing time and evaluated the results.
Here are my results:

I really believe that part of my problem with another script is the disk I / O problem, but as you can see here, with the disk I / O problem that I intentionally created, the increase in processing time is not gradual. It has a sharp jump, and then the results become quite unpredictable.
In my previous script, you can see some unpredictability and write a lot of files, so I have no doubt that the problem is caused, at least in part, by the disk I / O problem, but this still does not explain why the increase in processing time is gradual and, according to apparently at a constant speed.
I believe that there is some other factor that we do not take into account.