I write code that requires I / O to disk and calling the library I wrote that does some calculations and GPU work and then more I / O to disk to write the results back to the file.
I would like to create this as multi-threaded code, because the files are quite large. I want to be able to read part of a file, send it to the GPU library and write part back to the file. Nested disk I / O operations are quite large (for example, 10 GB), and the calculations are pretty fast on the GPU.
My question is rather a design question. Do I have to use separate streams to preload the data that goes to the GPU library, and only the main stream actually makes calls to the GPU library, and then sends the received data to other streams that will be written back to disk, or should I go ahead and to have all separate streams, each of which fulfills its own role - to capture a piece of data, execute on a GPU and write to disk, and then move on to the next piece of data?
I use CUDA for my GPU library. Is cuda smart enough not to try to run two cores on the GPU at once? I think I will need to manually control it to make sure that the two threads are not trying to add more data to the GPU than in space?
Any good multithreading and CUDA resources that are used in combination are welcome.
Derek source share