I think the key is to create a multi-threaded application, following all common practices for this, and has two types of workflows. One that works with the GPU, and one that works with the processor. So you need a thread pool and a queue.
http://en.wikipedia.org/wiki/Thread_pool_pattern
The queue can be very simple. You can have one common integer, which is the index of the current line in the log file. When the thread is ready to get more work, it blocks this index, gets a certain number of lines from the log file, starting from the line indicated by the index, then increases the index by the number of lines it retrieves, and then unlocks.
When a worker thread executes with one fragment of a log file, it returns the results to the main thread and receives another fragment (or exits if no more lines are processed).
The application launches some combinations of GPU and processor workflows to use all available GPUs and processor cores.
One of the problems that you may encounter is that if the processor is busy, the performance of GPUs may suffer, as there are slight delays when sending new work or processing results from GPUs. You may need to experiment with the number of threads and their affinity. For example, you might need to reserve one processor core for each GPU, manipulating the affinity of threads.
source share