cuda error cudastreamcreate

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue Jun 14 2011 - 00:45:37 CDT

Hello:
With a gaming machine
Gigabyte GA 890FXAUD5
Six-core AMD PhenomII 1075T
2x GTX 470
NAMD_CVS-2011-06-04_Linux-x86_64-CUDA.tar.gz
Debian GNU-Linux amd64 wheezy

I could run plainly MD:

nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
Pe 5 sharing CUDA device 1 first 1 next 1
Pe 2 sharing CUDA device 0 first 0 next 4
Did not find +devices i,j,k,... argument, using all
Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 3 sharing CUDA device 1 first 1 next 5
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 4 sharing CUDA device 0 first 0 next 0
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Info: 1.64104 MB of memory in use based on CmiMemoryUsage
Info: Configuration file is min-02.conf

Yesterday failure: "cuda error cudastreamcreate", which was resolved
by stepwise visiting

----/var/lib/dkms/nvidia/270.41.19/2.6.38-2-amd64/x86_64/module/nvidia.ko

and

----/lib/module/2.6.38-2-amd64/update/dkms/nvidia.ko

and (perhaps, unsure whether this next action was really carried out):

---reboot

whereby the machine worked nicely for several different tasks all day
and night long.

Today same error "cuda error cudastreamcreate" and the procedure
above, including reboot, is unable to get NAMD running. The log file
says:

Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
Pe 5 sharing CUDA device 0 first 0 next 0
Pe 5 physical rank 5 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 5 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device
0): no CUDA-capable device is available

Did not find +devices i,j,k,... argument, using all
Pe 0 sharing CUDA device 0 first 0 next 1
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
Pe 3 sharing CUDA device 0 first 0 next 4
Pe 3 physical rank 3 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
Pe 1 sharing CUDA device 0 first 0 next 2
Pe 1 physical rank 1 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device
0): no CUDA-capable device is available

FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device
0): no CUDA-capable device is available

FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device
0): no CUDA-capable device is available

Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device
0): no CUDA-capable device is available

Pe 4 sharing CUDA device 0 first 0 next 5
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device 0): no
CUDA-capable device is available
------------- Processor 4 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device
0): no CUDA-capable device is available

[0] Stack Traceback:

--------------------------------
nvidia-smi -r (or nvidia-smi -a)
NVIDIA: could not open the device file /dev/nvidia1 (no such file)
Failed to initialize NVML: unknown error.

If "nvidia-smi" is for Tesla only, how to check GTX 470?

Thanks for advice

francesco pietra

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:17 CST