cudaFree() is synchronous. If you really want it to be asynchronous, you can create your own CPU thread, give it a work queue and register cudaFree requests from the main thread.
However, asynchronous frees seem like an odd request. Perhaps you could explain why you want it to be asynchronous. Do you want the release to happen immediately after the CUDA event fires?
Mr Fooz
source share