Re: Linux-x86_64-CUDA version 2.8 on CentOS-5 x86_64 non local user issue?

From: Tru Huynh (tru_at_pasteur.fr)
Date: Wed Mar 28 2012 - 09:10:26 CDT

On Mon, Mar 26, 2012 at 07:43:17AM +0200, Norman Geist wrote:
> Ok so it seems to be related to namd. You could look if the problem persists
> with 2.9b just to avoid searching a problem that is already solved, if not,
> we need to look at the source.

It's fixed for 2.9b1:
/c5/shared/NAMD/2.9b1/x86_64-CUDA/namd2 +p2 +idlepoll prodLang2.inp
Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Info: NAMD 2.9b1 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Mon Mar 19 13:06:58 CDT 2012 by jim on naiad.ks.uiuc.edu
Info: 1 NAMD 2.9b1 Linux-x86_64-multicore-CUDA 2 scrappy.bis.pasteur.fr non_local_user
Info: Running on 2 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00651598 s
Pe 1 physical rank 1 binding to CUDA device 1 on scrappy.bis.pasteur.fr: 'Tesla M2090' Mem: 5375MB Rev: 2.0
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Tesla M2090' Mem: 5375MB Rev: 2.0
Info: 8.09375 MB of memory in use based on /proc/self/stat
Info: Configuration file is prodLang2.inp
Info: Working in the current directory /work/probleme_cuda
..

in the same console, for version 2.8 I get:
 ${CHARMRUN} ${NAMD} ++local +p2 +idlepoll ++nodelist nodelist +devices 0 prodLang2.inp
..
Info: Running on 2 processors, 2 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00626779 s
Pe 0 sharing CUDA device 0 first 0 next 1
Pe 0 physical rank 0 binding to CUDA device 0 on scrappy.bis.pasteur.fr: 'Device Emulation (CPU)' Mem: 0MB Rev: 9999.9999
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available

[0] Stack Traceback:
  [0:0] CmiAbort+0x7b [0xb138d9]
  [0:1] _Z8NAMD_diePKc+0x62 [0x537722]
  [0:2] _Z13cuda_errcheckPKc+0x149 [0x6f3391]
  [0:3] _Z15cuda_initializev+0x5f3 [0x6f312d]
  [0:4] _Z8all_initiPPc+0x45 [0x540af1]
  [0:5] _Z11master_initiPPc+0x67 [0x5407ab]
  [0:6] _ZN7BackEnd4initEiPPc+0xe8 [0x540724]
  [0:7] main+0x2f [0x53ba1f]
  [0:8] __libc_start_main+0xf4 [0x382001d994]
  [0:9] _ZNSt8ios_base4InitD1Ev+0x72 [0x53701a]
Fatal error on PE 0> FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (scrappy.bis.pasteur.fr device 0): no CUDA-capable device is available

Thanks

Tru

-- 
Dr Tru Huynh          | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:tru_at_pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France  

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:22 CST