I get this error when I use the slurm workload manager ( http://slurm.schedmd.com/ ). When I run python scripts with a tensor stream, sometimes this leads to an error (attached). It seems that it cannot find the cuda library, but I am running scripts that do not require GPUs. Therefore, I am confused why where it will be a problem at all. Why does installing cuda cause a problem if I don't need it?
The only useful information I received from the slurm-job_id file was the following:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:102] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /cm/shared/openmind/cuda/7.5/lib64:/cm/shared/openmind/cuda/7.5/lib I tensorflow/stream_executor/cuda/cuda_dnn.cc:2092] Unable to load cuDNN DSO I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: node047 I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: node047 I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.63 Sat Nov 7 21:25:42 PST 2015 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) """ I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.63.0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
I always thought that tensor flow would not require a GPU. Therefore, I assume that the last error indicates that no GPU is causing the error (correct me if I am wrong).
I donβt understand why I need the CUDA library. I'm trying to run my jobs using the GPU, why do I need a cuda library if my jobs are jobs on the CPU?
I tried to enter the node directly and start the tensor, but I did not get an explicit error:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:102] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /cm/shared/openmind/cuda/7.5/lib64:/cm/shared/openmind/cuda/7.5/lib I tensorflow/stream_executor/cuda/cuda_dnn.cc:2092] Unable to load cuDNN DSO I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
although I was expecting an error:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:102] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /cm/shared/openmind/cuda/7.5/lib64:/cm/shared/openmind/cuda/7.5/lib I tensorflow/stream_executor/cuda/cuda_dnn.cc:2092] Unable to load cuDNN DSO I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: node047 I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: node047 I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.63 Sat Nov 7 21:25:42 PST 2015 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) """ I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.63.0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
I also made the official git problem in tensorflow library:
https://github.com/tensorflow/tensorflow/issues/3632
source share