Using pthreads with CUDA - Design Considerations

I write code that requires I / O to disk and calling the library I wrote that does some calculations and GPU work and then more I / O to disk to write the results back to the file.

I would like to create this as multi-threaded code, because the files are quite large. I want to be able to read part of a file, send it to the GPU library and write part back to the file. Nested disk I / O operations are quite large (for example, 10 GB), and the calculations are pretty fast on the GPU.

My question is rather a design question. Do I have to use separate streams to preload the data that goes to the GPU library, and only the main stream actually makes calls to the GPU library, and then sends the received data to other streams that will be written back to disk, or should I go ahead and to have all separate streams, each of which fulfills its own role - to capture a piece of data, execute on a GPU and write to disk, and then move on to the next piece of data?

I use CUDA for my GPU library. Is cuda smart enough not to try to run two cores on the GPU at once? I think I will need to manually control it to make sure that the two threads are not trying to add more data to the GPU than in space?

Any good multithreading and CUDA resources that are used in combination are welcome.

+4
source share
2 answers

Threads will not help with disk I / O. People tend to tackle blocking problems by creating tons of threads. In fact, this only worsens the situation. What you need to do is use asynchronous I / O and not block the write (and read). You can use some common solutions, such as libevent or Asio to do this, or work with the lower level APIs available on your platform. On Linux, AIO seems best for files, but I haven't tried this yet. Hope this helps.

+3
source

I came across this situation with large files in my research work.

As far as I remember, there is no big gain in stream processing of disk I / O, as it is very slow compared to GPU I / O.

The strategy I used was to read synchronously from disk and load data and execute asynchronously on the GPU.

Sort of:

 read from disk loop: async_load_to_gpu async_execute push_event read from disk check event complete or read more data from disk 
+1
source

All Articles