Linux-x86_64-CUDA version 2.8 on CentOS-5 x86_64 non local user issue?

From: Tru Huynh (tru_at_pasteur.fr)
Date: Mon Mar 19 2012 - 18:31:40 CDT

Hello

I am facing an unexpected issue with the prebuilt executable of the Linux-x86_64-CUDA version 2.8.
there is no issue for the multicore prebuilt version).

A user (named nonluser) not listed in /etc/passwd when trying to run a
NAMD-2.8-Linux-x86_64-CUDA version fails with the following errors:
..
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Device Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
..

Just adding that user to /etc/passwd,/etc/shadow yields a user able to run NAMD-CUDA.
..
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Tesla M2090' Mem: 4095MB Rev: 2.0
Info: 1.62114 MB of memory in use based on CmiMemoryUsage
..

longer versions with more details:
background:

We are using openldap to manage our users account on CentOS-5 x86_64.

$HOME and the applications are NFS hosted

/etc/passwd only contains the CentOS provided system accounts and mine.
all the other group members accounts are only listed on the ldap servers.

/etc/nsswitch.conf:
..
passwd: files ldap
shadow: files ldap
group: files ldap
..

$ ls -ld /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Mar 18 15:56 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Mar 18 15:56 /dev/nvidia1
crw-rw-rw- 1 root root 195, 2 Mar 18 15:56 /dev/nvidia2
crw-rw-rw- 1 root root 195, 3 Mar 18 15:56 /dev/nvidia3
crw-rw-rw- 1 root root 195, 4 Mar 18 15:56 /dev/nvidia4
crw-rw-rw- 1 root root 195, 5 Mar 18 15:56 /dev/nvidia5
crw-rw-rw- 1 root root 195, 6 Mar 18 15:56 /dev/nvidia6
crw-rw-rw- 1 root root 195, 7 Mar 18 15:56 /dev/nvidia7
crw-rw-rw- 1 root root 195, 8 Mar 18 15:56 /dev/nvidia8
crw-rw-rw- 1 root root 195, 9 Mar 18 15:56 /dev/nvidia9
crw-rw-rw- 1 root root 195, 255 Mar 18 15:56 /dev/nvidiactl

$ nvidia-smi
Tue Mar 20 00:14:42 2012
+------------------------------------------------------+
| NVIDIA-SMI 2.290.10 Driver Version: 290.10 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. Tesla M2090 | 0000:02:00.0 Off | 0 0 |
| N/A N/A P12 30W / 225W | 0% 9MB / 5375MB | 0% Default |
|-------------------------------+----------------------+----------------------|
| 1. Tesla M2090 | 0000:03:00.0 Off | 0 0 |
| N/A N/A P12 31W / 225W | 0% 9MB / 5375MB | 0% Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
+-----------------------------------------------------------------------------+
---+

symptom:
a user (named nonluser) not listed in /etc/passwd when trying to run a
NAMD-2.8-Linux-x86_64-CUDA version fails with the following errors:

[nonluser ~]$ module purge
[nonluser ~]$ module load NAMD/released-2.8/x86_64-CUDA
[nonluser ~]$ export CHARMRUN=/c5/shared/NAMD/2.8/x86_64-CUDA/charmrun
[nonluser ~]$ export NAMD=/c5/shared/NAMD/2.8/x86_64-CUDA/namd2
[nonluser ~]$ ${CHARMRUN} ${NAMD} ++local +p1 +idlepoll ++nodelist nodelist +devices 0 prodLang2.inp
Charmrun> started all node programs in 0.004 seconds.
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Info: NAMD 2.8 for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat May 28 11:30:15 CDT 2011 by jim on larissa.ks.uiuc.edu
Info: 1 NAMD 2.8 Linux-x86_64-CUDA 1 scrappy.bis.pasteur.fr nonluser
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00441313 s
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Device Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available

[0] Stack Traceback:
  [0:0] CmiAbort+0x7b [0xb138d9]
  [0:1] _Z8NAMD_diePKc+0x62 [0x537722]
  [0:2] _Z13cuda_errcheckPKc+0x149 [0x6f3391]
  [0:3] _Z15cuda_initializev+0x5f3 [0x6f312d]
  [0:4] _Z8all_initiPPc+0x45 [0x540af1]
  [0:5] _Z11master_initiPPc+0x67 [0x5407ab]
  [0:6] _ZN7BackEnd4initEiPPc+0xe8 [0x540724]
  [0:7] main+0x2f [0x53ba1f]
  [0:8] __libc_start_main+0xf4 [0x3f8501d994]
  [0:9] _ZNSt8ios_base4InitD1Ev+0x72 [0x53701a]
Fatal error on PE 0> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available

Just adding a entry in /etc/passwd,/etc/shadow for that user allows him to run the code (nothing else changed)

[nonluser ~]$ ${CHARMRUN} ${NAMD} ++local +p1 +idlepoll ++nodelist nodelist +devices 0 prodLang2.inp
Charmrun> started all node programs in 0.004 seconds.
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Info: NAMD 2.8 for Linux-x86_64-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat May 28 11:30:15 CDT 2011 by jim on larissa.ks.uiuc.edu
Info: 1 NAMD 2.8 Linux-x86_64-CUDA 1 scrappy.bis.pasteur.fr nonluser
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00161791 s
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Tesla M2090' Mem: 4095MB Rev: 2.0
Info: 1.62114 MB of memory in use based on CmiMemoryUsage
Info: Configuration file is prodLang2.inp
Info: Working in the current directory /work/probleme_cuda
TCL: Suspending until startup complete.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 0
Info: STEPS PER CYCLE 20
Info: PERIODIC CELL BASIS 1 180 0 0
Info: PERIODIC CELL BASIS 2 0 90 0
Info: PERIODIC CELL BASIS 3 0 0 85
Info: PERIODIC CELL CENTER 0 0 0
Info: LOAD BALANCER Centralized
Info: LOAD BALANCING STRATEGY New Load Balancers -- DEFAULT
Info: LDB PERIOD 4000 steps
Info: FIRST LDB TIMESTEP 100
Info: LAST LDB TIMESTEP -1
Info: LDB BACKGROUND SCALING 1
Info: HOM BACKGROUND SCALING 1
Info: PME BACKGROUND SCALING 1
Info: MIN ATOMS PER PATCH 40
Info: VELOCITY FILE 1oke-oistep-lang1.vel
Info: CENTER OF MASS MOVING INITIALLY? NO
Info: DIELECTRIC 1
Info: EXCLUDE SCALED ONE-FOUR
Info: 1-4 ELECTROSTATICS SCALED BY 1
Info: MODIFIED 1-4 VDW PARAMETERS WILL BE USED
Info: DCD FILENAME 1oke-oistep-lang1.2.dcd
Info: DCD FREQUENCY 10000
Info: DCD FIRST STEP 10000
Info: DCD FILE WILL CONTAIN UNIT CELL DATA
Info: NO EXTENDED SYSTEM TRAJECTORY OUTPUT
Info: NO VELOCITY DCD OUTPUT
Info: NO FORCE DCD OUTPUT
Info: OUTPUT FILENAME 1oke-oistep-lang1.2
Info: BINARY OUTPUT FILES WILL BE USED
Info: RESTART FILENAME 1oke-oistep-lang1.2.restart
Info: RESTART FREQUENCY 10000
Info: BINARY RESTART FILES WILL BE USED
Info: SWITCHING ACTIVE
Info: SWITCHING ON 8
Info: SWITCHING OFF 12
Info: PAIRLIST DISTANCE 13.5
Info: PAIRLIST SHRINK RATE 0.01
Info: PAIRLIST GROW RATE 0.01
Info: PAIRLIST TRIGGER 0.3
Info: PAIRLISTS PER CYCLE 2
Info: PAIRLISTS ENABLED
Info: MARGIN 0
Info: HYDROGEN GROUP CUTOFF 2.5
Info: PATCH DIMENSION 16
Info: CROSSTERM ENERGY INCLUDED IN DIHEDRAL
Info: TIMING OUTPUT STEPS 100
Info: LANGEVIN DYNAMICS ACTIVE
Info: LANGEVIN TEMPERATURE 300
Info: LANGEVIN DAMPING COEFFICIENT IS 1 INVERSE PS
Info: LANGEVIN DYNAMICS NOT APPLIED TO HYDROGENS
Info: PARTICLE MESH EWALD (PME) ACTIVE
Info: PME TOLERANCE 1e-06
Info: PME EWALD COEFFICIENT 0.257952
Info: PME INTERPOLATION ORDER 4
Info: PME GRID DIMENSIONS 128 64 64
Info: PME MAXIMUM GRID SPACING 1.5
Info: Attempting to read FFTW data from FFTW_NAMD_2.8_Linux-x86_64-CUDA.txt
Info: Optimizing 6 FFT steps. 1...
<...>

Thanks

Tru

-- 
Dr Tru Huynh          | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:tru_at_pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France  

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:20 CST