From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Mar 20 2012 - 01:36:34 CDT
Hi,
from what you wrote it don't look like a namd problem, but a permissions
problem. To use the cuda devices (I think it's "/dev/nvidia" or similar)
your user needs the permissions to access this devices. So when you added
your user locally, it likely got a group that is allowed. So make sure your
user is in a group that is allowed to access the devices, or make access to
the devices allowed by everyone.
If the above is the problem, you shouldn’t be able to run any cuda binary
with that users, not only namd.
Let us know.
Norman Geist.
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Tru Huynh
> Gesendet: Dienstag, 20. März 2012 00:32
> An: namd-l_at_ks.uiuc.edu
> Betreff: namd-l: Linux-x86_64-CUDA version 2.8 on CentOS-5 x86_64 non
> local user issue?
> 
> Hello
> 
> I am facing an unexpected issue with the prebuilt executable of the
> Linux-x86_64-CUDA version 2.8.
> there is no issue for the multicore prebuilt version).
> 
> A user (named nonluser) not listed in /etc/passwd when trying to run a
> NAMD-2.8-Linux-x86_64-CUDA version fails with the following errors:
> ..
> Pe 0 physical rank 0 binding to CUDA device 0 on
> scrappy.bis.pasteur.fr: 'Device Emulation (CPU)'  Mem: 0MB  Rev:
> 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0
> (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
> ..
> 
> Just adding that user to /etc/passwd,/etc/shadow yields a user able to
> run NAMD-CUDA.
> ..
> Pe 0 physical rank 0 binding to CUDA device 0 on
> scrappy.bis.pasteur.fr: 'Tesla M2090'  Mem: 4095MB  Rev: 2.0
> Info: 1.62114 MB of memory in use based on CmiMemoryUsage
> ..
> 
> longer versions with more details:
> background:
> 
> We are using openldap to manage our users account on CentOS-5 x86_64.
> 
> $HOME and the applications are NFS hosted
> 
> /etc/passwd only contains the CentOS provided system accounts and mine.
> all the other group members accounts are only listed on the ldap
> servers.
> 
> /etc/nsswitch.conf:
> ..
> passwd:     files ldap
> shadow:     files ldap
> group:      files ldap
> ..
> 
> $ ls -ld /dev/nvidia*
> crw-rw-rw- 1 root root 195,   0 Mar 18 15:56 /dev/nvidia0
> crw-rw-rw- 1 root root 195,   1 Mar 18 15:56 /dev/nvidia1
> crw-rw-rw- 1 root root 195,   2 Mar 18 15:56 /dev/nvidia2
> crw-rw-rw- 1 root root 195,   3 Mar 18 15:56 /dev/nvidia3
> crw-rw-rw- 1 root root 195,   4 Mar 18 15:56 /dev/nvidia4
> crw-rw-rw- 1 root root 195,   5 Mar 18 15:56 /dev/nvidia5
> crw-rw-rw- 1 root root 195,   6 Mar 18 15:56 /dev/nvidia6
> crw-rw-rw- 1 root root 195,   7 Mar 18 15:56 /dev/nvidia7
> crw-rw-rw- 1 root root 195,   8 Mar 18 15:56 /dev/nvidia8
> crw-rw-rw- 1 root root 195,   9 Mar 18 15:56 /dev/nvidia9
> crw-rw-rw- 1 root root 195, 255 Mar 18 15:56 /dev/nvidiactl
> 
> $ nvidia-smi
> Tue Mar 20 00:14:42 2012
> +------------------------------------------------------+
> | NVIDIA-SMI 2.290.10   Driver Version: 290.10         |
> |-------------------------------+----------------------+---------------
> -------+
> | Nb.  Name                     | Bus Id        Disp.  | Volatile ECC
> SB / DB |
> | Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util.
> Compute M. |
> |===============================+======================+===============
> =======|
> | 0.  Tesla M2090               | 0000:02:00.0  Off    |         0
> 0 |
> |  N/A    N/A  P12   30W / 225W |   0%    9MB / 5375MB |    0%
> Default    |
> |-------------------------------+----------------------+---------------
> -------|
> | 1.  Tesla M2090               | 0000:03:00.0  Off    |         0
> 0 |
> |  N/A    N/A  P12   31W / 225W |   0%    9MB / 5375MB |    0%
> Default    |
> |-------------------------------+----------------------+---------------
> -------|
> | Compute processes:                                               GPU
> Memory |
> |  GPU  PID     Process name
> Usage      |
> |======================================================================
> =======|
> |  No running compute processes found
> |
> +----------------------------------------------------------------------
> -------+
> ---+
> 
> 
> symptom:
> a user (named nonluser) not listed in /etc/passwd when trying to run a
> NAMD-2.8-Linux-x86_64-CUDA version fails with the following errors:
> 
> [nonluser ~]$ module purge
> [nonluser ~]$ module load NAMD/released-2.8/x86_64-CUDA
> [nonluser ~]$ export CHARMRUN=/c5/shared/NAMD/2.8/x86_64-CUDA/charmrun
> [nonluser ~]$ export NAMD=/c5/shared/NAMD/2.8/x86_64-CUDA/namd2
> [nonluser ~]$ ${CHARMRUN} ${NAMD} ++local +p1  +idlepoll ++nodelist
> nodelist +devices 0 prodLang2.inp
> Charmrun> started all node programs in 0.004 seconds.
> Warning> Randomization of stack pointer is turned on in kernel, thread
> migration may not work! Run 'echo 0 >
> /proc/sys/kernel/randomize_va_space' as root to disable it, or try run
> with '+isomalloc_sync'.
> Charm++> scheduler running in netpoll mode.
> Charm++> Running on 1 unique compute nodes (12-way SMP).
> Charm++> cpu topology info is gathered in 0.000 seconds.
> Info: NAMD 2.8 for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat May 28 11:30:15 CDT 2011 by jim on larissa.ks.uiuc.edu
> Info: 1 NAMD  2.8  Linux-x86_64-CUDA  1    scrappy.bis.pasteur.fr
> nonluser
> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00441313
> s
> Pe 0 physical rank 0 binding to CUDA device 0 on
> scrappy.bis.pasteur.fr: 'Device Emulation (CPU)'  Mem: 0MB  Rev:
> 9999.9999
> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0
> (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
> ------------- Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0
> (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
> 
> [0] Stack Traceback:
>   [0:0] CmiAbort+0x7b  [0xb138d9]
>   [0:1] _Z8NAMD_diePKc+0x62  [0x537722]
>   [0:2] _Z13cuda_errcheckPKc+0x149  [0x6f3391]
>   [0:3] _Z15cuda_initializev+0x5f3  [0x6f312d]
>   [0:4] _Z8all_initiPPc+0x45  [0x540af1]
>   [0:5] _Z11master_initiPPc+0x67  [0x5407ab]
>   [0:6] _ZN7BackEnd4initEiPPc+0xe8  [0x540724]
>   [0:7] main+0x2f  [0x53ba1f]
>   [0:8] __libc_start_main+0xf4  [0x3f8501d994]
>   [0:9] _ZNSt8ios_base4InitD1Ev+0x72  [0x53701a]
> Fatal error on PE 0> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0
> (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
> 
> Just adding a entry in /etc/passwd,/etc/shadow for that user allows him
> to run the code (nothing else changed)
> 
> [nonluser ~]$ ${CHARMRUN} ${NAMD} ++local +p1  +idlepoll ++nodelist
> nodelist +devices 0 prodLang2.inp
> Charmrun> started all node programs in 0.004 seconds.
> Warning> Randomization of stack pointer is turned on in kernel, thread
> migration may not work! Run 'echo 0 >
> /proc/sys/kernel/randomize_va_space' as root to disable it, or try run
> with '+isomalloc_sync'.
> Charm++> scheduler running in netpoll mode.
> Charm++> Running on 1 unique compute nodes (12-way SMP).
> Charm++> cpu topology info is gathered in 0.000 seconds.
> Info: NAMD 2.8 for Linux-x86_64-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat May 28 11:30:15 CDT 2011 by jim on larissa.ks.uiuc.edu
> Info: 1 NAMD  2.8  Linux-x86_64-CUDA  1    scrappy.bis.pasteur.fr
> nonluser
> Info: Running on 1 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00161791
> s
> Pe 0 physical rank 0 binding to CUDA device 0 on
> scrappy.bis.pasteur.fr: 'Tesla M2090'  Mem: 4095MB  Rev: 2.0
> Info: 1.62114 MB of memory in use based on CmiMemoryUsage
> Info: Configuration file is prodLang2.inp
> Info: Working in the current directory /work/probleme_cuda
> TCL: Suspending until startup complete.
> Info: SIMULATION PARAMETERS:
> Info: TIMESTEP               1
> Info: NUMBER OF STEPS        0
> Info: STEPS PER CYCLE        20
> Info: PERIODIC CELL BASIS 1  180 0 0
> Info: PERIODIC CELL BASIS 2  0 90 0
> Info: PERIODIC CELL BASIS 3  0 0 85
> Info: PERIODIC CELL CENTER   0 0 0
> Info: LOAD BALANCER  Centralized
> Info: LOAD BALANCING STRATEGY  New Load Balancers -- DEFAULT
> Info: LDB PERIOD             4000 steps
> Info: FIRST LDB TIMESTEP     100
> Info: LAST LDB TIMESTEP     -1
> Info: LDB BACKGROUND SCALING 1
> Info: HOM BACKGROUND SCALING 1
> Info: PME BACKGROUND SCALING 1
> Info: MIN ATOMS PER PATCH    40
> Info: VELOCITY FILE          1oke-oistep-lang1.vel
> Info: CENTER OF MASS MOVING INITIALLY? NO
> Info: DIELECTRIC             1
> Info: EXCLUDE                SCALED ONE-FOUR
> Info: 1-4 ELECTROSTATICS SCALED BY 1
> Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
> Info: DCD FILENAME           1oke-oistep-lang1.2.dcd
> Info: DCD FREQUENCY          10000
> Info: DCD FIRST STEP         10000
> Info: DCD FILE WILL CONTAIN UNIT CELL DATA
> Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
> Info: NO VELOCITY DCD OUTPUT
> Info: NO FORCE DCD OUTPUT
> Info: OUTPUT FILENAME        1oke-oistep-lang1.2
> Info: BINARY OUTPUT FILES WILL BE USED
> Info: RESTART FILENAME       1oke-oistep-lang1.2.restart
> Info: RESTART FREQUENCY      10000
> Info: BINARY RESTART FILES WILL BE USED
> Info: SWITCHING ACTIVE
> Info: SWITCHING ON           8
> Info: SWITCHING OFF          12
> Info: PAIRLIST DISTANCE      13.5
> Info: PAIRLIST SHRINK RATE   0.01
> Info: PAIRLIST GROW RATE     0.01
> Info: PAIRLIST TRIGGER       0.3
> Info: PAIRLISTS PER CYCLE    2
> Info: PAIRLISTS ENABLED
> Info: MARGIN                 0
> Info: HYDROGEN GROUP CUTOFF  2.5
> Info: PATCH DIMENSION        16
> Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
> Info: TIMING OUTPUT STEPS    100
> Info: LANGEVIN DYNAMICS ACTIVE
> Info: LANGEVIN TEMPERATURE   300
> Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
> Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
> Info: PARTICLE MESH EWALD (PME) ACTIVE
> Info: PME TOLERANCE               1e-06
> Info: PME EWALD COEFFICIENT       0.257952
> Info: PME INTERPOLATION ORDER     4
> Info: PME GRID DIMENSIONS         128 64 64
> Info: PME MAXIMUM GRID SPACING    1.5
> Info: Attempting to read FFTW data from FFTW_NAMD_2.8_Linux-x86_64-
> CUDA.txt
> Info: Optimizing 6 FFT steps.  1...
> <...>
> 
> Thanks
> 
> Tru
> --
> Dr Tru Huynh          | http://www.pasteur.fr/recherche/unites/Binfs/
> mailto:tru_at_pasteur.fr | tel/fax +33 1 45 68 87 37/19
> Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15
> France
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:20 CST