no CUDA-capable device is detected

From: Damián Montaldo (dmontaldo_at_dc.uba.ar)
Date: Thu Jul 19 2012 - 14:12:06 CDT

Next message: c jepson: "NAMD2.9 Release and Multi-GPU Minimization Errors"
Previous message: Aron Broom: "Generalized Born Solvent Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi, I'm trying to use NAMD (NAMD_2.9_Linux-x86_64-multicore-CUDA) with cuda.

This is the version of NAMD and this is error:
cudarulez_at_n2:~/inputs$ export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/NAMD_2.9_Linux-x86_64-multicore-CUDA

cudarulez_at_n2:~/inputs$ /opt/NAMD_2.9_Linux-x86_64-multicore-CUDA/namd2 +p4
+idlepoll ubq_wb_eq.conf
Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (4-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD 2.9 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Mon Apr 30 14:02:11 CDT 2012 by jim on naiad.ks.uiuc.edu
Info: 1 NAMD 2.9 Linux-x86_64-multicore-CUDA 4 n2 cudarulez
Info: Running on 4 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.019984 s
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (n2): no
CUDA-capable device is detected
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (n2): no
CUDA-capable device is detected
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (n2): no
CUDA-capable device is detected

Program finished.
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (n2): no
CUDA-capable device is detected
Segmentation fault

Its a Debian GNU/Linux "up-to-date" in wheezy (to have all the cuda
packages from the official debian repositories).
I have installed debian "official" drivers for nvidia and cuda toolkit
from debian too.

I found in the mailing list archive related questions about the installed
driver
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2011-2012/1259.html

But if I run nvidia-detect (from a debian package) everything seems to work
$ nvidia-detect
Detected NVIDIA GPUs:
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218
[GeForce 210] [10de:0a65] (rev a2)
Your card is supported by the default drivers.

I also tried with the nvidia SDK deviceQuery (and some of the examples and
it works too)
n2:~# /opt/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery
[deviceQuery] starting...

/opt/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "Tesla C1060"
  CUDA Driver Version / Runtime Version 4.2 / 4.2
  CUDA Capability Major/Minor version number: 1.3
  Total amount of global memory: 4096 MBytes (4294770688
bytes)
  (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores
  GPU Clock rate: 1296 MHz (1.30 GHz)
  Memory Clock rate: 800 Mhz
  Memory Bus Width: 512-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per multiprocessor: 1024
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Concurrent kernel execution: No
  Alignment requirement for Surfaces: Yes
  Device has ECC support enabled: No
  Device is using TCC driver mode: No
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >

Device 1: "GeForce 210"
  CUDA Driver Version / Runtime Version 4.2 / 4.2
  CUDA Capability Major/Minor version number: 1.2
  Total amount of global memory: 511 MBytes (536150016 bytes)
  ( 2) Multiprocessors x ( 8) CUDA Cores/MP: 16 CUDA Cores
  GPU Clock rate: 1400 MHz (1.40 GHz)
  Memory Clock rate: 400 Mhz
  Memory Bus Width: 64-bit
  Max Texture Dimension Size (x,y,z) 1D=(8192),
2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers 1D=(8192) x 512,
2D=(8192,8192) x 512
  Total amount of constant memory: 65536 bytes
  Total amount of shared memory per block: 16384 bytes
  Total number of registers available per block: 16384
  Warp size: 32
  Maximum number of threads per multiprocessor: 1024
  Maximum number of threads per block: 512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
  Maximum memory pitch: 2147483647 bytes
  Texture alignment: 256 bytes
  Concurrent copy and execution: Yes with 1 copy engine(s)
  Run time limit on kernels: No
  Integrated GPU sharing Host Memory: No
  Support host page-locked memory mapping: Yes
  Concurrent kernel execution: No
  Alignment requirement for Surfaces: Yes
  Device has ECC support enabled: No
  Device is using TCC driver mode: No
  Device supports Unified Addressing (UVA): No
  Device PCI Bus ID / PCI location ID: 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime
Version = 4.2, NumDevs = 2, Device = Tesla C1060, Device = GeForce 210
[deviceQuery] test results...
PASSED

> exiting in 3 seconds:

3...2...1...done!

I tried reinstalling and using the official drivers and toolkit from
nvidia and I'm stuck in the same error...

I found some issue(?) with users not listed in the passwd. I'm using nis
so I create a local user and it fails too.
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2011-2012/2815.html

I don't know how to continue because I search for any related topic in the
list archive and I can't find anything more...

So any help it would be very appreciated.
Thanks for your time!
Damián.

Next message: c jepson: "NAMD2.9 Release and Multi-GPU Minimization Errors"
Previous message: Aron Broom: "Generalized Born Solvent Question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:49 CST