AW: Linux-x86_64-CUDA version 2.8 on CentOS-5 x86_64 non local user issue?

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Mar 23 2012 - 02:09:28 CDT

Tru,

Unfortunately you still didn't executed a cuda binary, just a tool that
talks to the driver to get some data. Please run a program that does real
computation on the gpu. A tool that needs to alloc the gpu and uses it, so
we can clearly figure out if it is a namd problem or a general one.

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Tru Huynh
> Gesendet: Donnerstag, 22. März 2012 20:29
> An: Norman Geist
> Cc: Namd Mailing List
> Betreff: Re: namd-l: Linux-x86_64-CUDA version 2.8 on CentOS-5 x86_64
> non local user issue?
>
> Hi,
>
> thanks for looking at that issue,
>
> On Wed, Mar 21, 2012 at 08:02:41AM +0100, Norman Geist wrote:
> > Tru,
> >
> > nvidia-smi is not a cuda program, it's just a driver utility. Please
> check
> > if you can run other cuda programs, maybe one example from the cuda
> sdk,
> from a ldap only user account:
> /c5/shared/cuda/4.1.28/C/bin/linux/release/deviceQuery
> [deviceQuery] starting...
>
> /c5/shared/cuda/4.1.28/C/bin/linux/release/deviceQuery Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Found 2 CUDA Capable device(s)
>
> Device 0: "Tesla M2090"
> CUDA Driver Version / Runtime Version 4.1 / 4.1
> CUDA Capability Major/Minor version number: 2.0
> Total amount of global memory: 5375 MBytes
> (5636554752 bytes)
> (16) Multiprocessors x (32) CUDA Cores/MP: 512 CUDA Cores
> GPU Clock Speed: 1.30 GHz
> Memory Clock rate: 1848.00 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 786432 bytes
> Max Texture Dimension Size (x,y,z) 1D=(65536),
> 2D=(65536,65535), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(16384) x 2048,
> 2D=(16384,16384) x 2048
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 32768
> Warp size: 32
> Maximum number of threads per block: 1024
> Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and execution: Yes with 2 copy
> engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Concurrent kernel execution: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support enabled: Yes
> Device is using TCC driver mode: No
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 2 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> Device 1: "Tesla M2090"
> CUDA Driver Version / Runtime Version 4.1 / 4.1
> CUDA Capability Major/Minor version number: 2.0
> Total amount of global memory: 5375 MBytes
> (5636554752 bytes)
> (16) Multiprocessors x (32) CUDA Cores/MP: 512 CUDA Cores
> GPU Clock Speed: 1.30 GHz
> Memory Clock rate: 1848.00 Mhz
> Memory Bus Width: 384-bit
> L2 Cache Size: 786432 bytes
> Max Texture Dimension Size (x,y,z) 1D=(65536),
> 2D=(65536,65535), 3D=(2048,2048,2048)
> Max Layered Texture Size (dim) x layers 1D=(16384) x 2048,
> 2D=(16384,16384) x 2048
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 32768
> Warp size: 32
> Maximum number of threads per block: 1024
> Maximum sizes of each dimension of a block: 1024 x 1024 x 64
> Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and execution: Yes with 2 copy
> engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Concurrent kernel execution: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support enabled: Yes
> Device is using TCC driver mode: No
> Device supports Unified Addressing (UVA): Yes
> Device PCI Bus ID / PCI location ID: 3 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.1, CUDA
> Runtime Version = 4.1, NumDevs = 2, Device = Tesla M2090, Device =
> Tesla M2090
> [deviceQuery] test results...
> PASSED
>
> > exiting in 3 seconds: 3...2...1...done!
>
>
> > It looks like your user has no permission to list the available
> devices. So
> > check what is the difference between local users and non-local
> > (bashrc,LD_LIBRARY_PATH...,cuda-toolkit).
> it's the same $HOME, same user, the only difference is adding that user
> to /etc/passwd
>
> I can reproductibly:
> 1) su - that user
> 2) fail to run namd but run deviceQuery (/dev/nvidia* are 666)
> 3) on another shell as root, just add that user to /etc/passwd and
> /etc/shadow
> 4) successfully run namd (on the same shell that failed on 2) by
> hitting <up><return>
> 5) remove the user from /etc/passwd and /etc/shadow
> 6) fail again namd as that used (same shell/window as 4) by hitting
> <up><return>
>
> > Check with "ldd namd2" if local and non-local users use the same
> shared
> > librarys.
> yes, nothing changed.
>
> Tru
> --
> Dr Tru Huynh | http://www.pasteur.fr/recherche/unites/Binfs/
> mailto:tru_at_pasteur.fr | tel/fax +33 1 45 68 87 37/19
> Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15
> France

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:21:47 CST