I have a program that does a limited form of multithreading. It is written in Delphi and uses libmysql.dll (C API) to access the MySQL server. The program should process a long list of records, taking ~ 0.1 s per record. Think of it as a big loop. All access to the database is carried out by worker threads, which either pre-select the following records or record the results, so the main thread should not wait.
At the beginning of this cycle, we first wait for the prefetch stream, get the results, and then the prefile stream executes the query for the next record. The idea is that the prefetch thread sends the request immediately and waits for the results while the main thread completes the loop.
This often works. But note that nothing guarantees that the prefetch thread starts immediately. I found that often the request was not sent until the main thread was looped and waited for the prefetch.
I sorted the fix by calling sleep (0) immediately after starting the prefetch stream. Thus, the main thread hands over the remainder of its temporary fragment, hoping that the prefetch stream will now be started by sending a request. Then this thread will sleep while waiting, which will allow the main thread to start again.
Of course, the OS has more threads, but it really works to some extent.
I really want the main thread to send the request, and then the worker thread is waiting for the results. Using libmysql.dll I call
result := mysql_query(p.SqlCon,pChar(p.query));
in the workflow. Instead, I would like the main thread to call something like
mysql_threadedquery(p.SqlCon,pChar(p.query),thread);
which will cancel the task as soon as the data goes out.
Does anyone know something like this?
This is really a scheduling issue, so I can try to use a prefetch thread with a higher priority and then lower its priority after sending the request. But then again, I don't have a mysql call that separates sending the request from receiving the results.
Maybe there, and I just don't know about it. Enlighten me, please.
Question added:
Does anyone think this problem will be solved by starting the prefetch thread with a higher priority than the main thread? The idea is that prefetching immediately crowds out the main thread and sends a request. Then he will sleep, waiting for a server response. Meanwhile, the main thread will work.
Added: Details of the current implementation
This program performs calculations on the data contained in the MySQL database. There are 33M items that are added every second. The program runs continuously, processes new elements and sometimes re-analyzes old elements. He gets a list of elements for analysis from the table, so at the beginning of the passage (current element) he knows the next identifier of the element that he will need.
Since each element is independent, this is an ideal goal for multiprocessing. The easiest way to do this is to run multiple instances of the program on multiple machines. The program is optimized by profiling, rewriting and redesigning algorithms. However, one instance uses 100% of the processor core when it is not hungry. I run 4-8 copies on two quad workstations. But in this scenario, they should spend time on the MySQL server. (Server / DB schema optimization is another topic.)
I implemented multithreading in this process to avoid blocking SQL calls. That's why I called this "limited multithreading." A workflow has one task: send a command and wait for the results. (OK, two tasks.)
It turns out there are 6 lock tasks associated with 6 tables. Two of these read data and the remaining 4 entries. They are similar enough to be determined using the general structure of the problem. A pointer to this task is passed to the thread manager, which assigns the thread to do the work. The main thread can check the status of the task through the Task structure.
This makes the mainstream code very simple. When he needs to execute Task1, he expects that Task1 is not busy, puts the SQL command in Task1 and passes it. When Task1 is no longer busy, it contains the results (if any).
The 4 tasks that record the results are trivial. The main thread has task record entries when it moves to the next item. When this is done with this element, make sure that the previous record ended before starting another.
2 reading streams are less trivial. Nothing would have happened by passing the read to the stream, and then waiting for the results. Instead, these tasks precede the data for the next item. Therefore, the main thread arriving at these locking tasks checks whether prefetching is being performed; Waits, if necessary, for the prefetch to complete, then takes data from the Task. Finally, he reissues the task using the NEXT Item ID.
The idea is for the prefetch task to immediately issue a query and wait for the MySQL server. Then the main thread can process the current element, and by the time it is launched in the next element, the data that it needs is in the prefetch task.
So, threads, thread pool, synchronization, data structures, etc. done. And it all works. I was left with a schedule problem.
The problem with the schedule is this: any speed boost processes the current item while the server retrieves the next item. Before processing the current item, we issue a prefetch task, but how can we guarantee that it will start? The OS planner does not know that it is important that the prefetch task immediately issues a request, and then it will do nothing but wait.
The OS scheduler tries to be “fair” and allows each task to run an assigned time slice. In the worst case: the main thread gets its slice and produces a preliminary selection, then finishes the current element and must wait for the next element. Waiting releases the rest of the time, so the scheduler starts the prefetch stream, which issues a request and then waits. Now both threads are waiting. When the server signals that the request has been completed, the prefetch stream restarts and requests that the results (data set) will then sleep. When the server provides the results, the profile thread wakes up, marks the completion of the task, and ends. Finally, the main thread reboots and takes data from the finished task.
To avoid this worst-case planning, I need to somehow ensure that the prefetch request is released before the main thread continues with the current item. So far I have been thinking about three ways to do this:
Immediately after setting the prefetch task, the main thread calls Sleep (0). This should leave the rest of the time. Then I hope that the scheduler launches a prefetch thread that issues a request and then waits. Then the scheduler should restart the main thread (hopefully). No matter how bad it sounds, it works better than nothing.
I could release a prefetch thread with a higher priority than the main thread. This should force the scheduler to run it immediately, even if it should preempt the main thread. It can also have undesirable consequences. It seems unnatural if the background thread gets a higher priority.
I could execute the request asynchronously. That is, by separately sending a request from receiving the results. This way, I could associate the main thread with preload using mysql_send_query (without blocking) and continue working with the current element. Then, when he needed the next element, he would call mysql_read_query, which would lock until the data was available.
Note that solution 3 does not even use a workflow. This seems like the best answer, but requires rewriting some low-level code. I am currently looking for examples of such asynchronous access to a client server.
I also like any experienced opinions on these approaches. Am I missing something, or am I doing something wrong? Please note that this is all working code. I do not ask how to do it, but how to do it better / faster.