How to sync with Julia CUDArt?

Question

How to sync with Julia CUDArt?

I'm just starting to use the Julia CUDArt package to manage GPU computing. I am wondering how to make sure that if I proceed to extract data from gpu (for example, using to_host() ), I will not do this before all the necessary calculations are performed on it.

Through some experimentation, it seems that to_host(CudaArray) will lag while a specific CudaArray is being updated. So maybe just using this is enough for security? But this is a little strange.

I am currently using the launch() function to launch my kernels, as shown in the documentation.

The CUDArt documentation gives an example using the Julia @sync macro, which seems like it can be great. But for the purposes of @sync I did my “job” and am ready to move as soon as the kernel is launched using launch() , and not after it is completed. As far as I understand, the launch() operation - there is no way to change this function (for example, make it wait for the function that it launches to receive).

How can I do this kind of synchronization?

+8

parallel-processing julia-lang julia-gpu

Michael ohlrogge Jun 19 '16 at 1:12

source share

2 answers

Ok, so there is not a ton of documentation in the CUDArt package, but I looked at the source code, and I think it looks simple, how to do it. In particular, it seems that there is a function device_synchronize() , which will be blocked until all work on the current active device is completed. Thus, in particular, the following seems to work:

 using CUDArt md = CuModule("/path/to/module.ptx",false) MyFunc = CuFunction(md,"MyFunc") GridDim = 2*2496 BlockDim = 64 launch(MyFunc, GridDim, BlockDim, (arg1, arg2, ...)); device_synchronize() res = to_host(arg2)

I would love to hear from someone who has more experience, but if there is anything else you need to know about here.

+10

Michael ohlrogge Jun 19 '16 at 14:23

source share

Chris rackauckas · Accepted Answer · 2016-07-11T00:35:27+0000

I think a more canonical way is to create a stream for each device:

streams = [(device(dev); Stream()) for dev in devlist]

and then inside the @async block, after you tell it to perform calculations, you use the wait(stream) function to tell it to wait for this stream to complete for its calculations. See Streams Example in README.

How to sync with Julia CUDArt?

More articles: