namd on nvidia 302.17

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Wed Sep 26 2012 - 08:58:45 CDT

Hi:
Following updating/upgrading of Debian GNU-Linux amd64 wheezy,
minimizations do not run anymore on GTX-680:

CUDA error in CudaGetDeviceCount on Pe3 Pe4, Pe6. Initialization error.

The two GTX are regularly activated with
nvidia-smi -L
nvidia-smi -pm 1

Server and nvidia are the same version:

francesco_at_gig64:~$ dpkg -l |grep nvidia
ii glx-alternative-nvidia 0.2.2
      amd64 allows the selection of NVIDIA as GLX provider
ii libgl1-nvidia-alternatives 302.17-3
      amd64 transition libGL.so* diversions to
glx-alternative-nvidia
ii libgl1-nvidia-glx:amd64 302.17-3
      amd64 NVIDIA binary OpenGL libraries
ii libglx-nvidia-alternatives 302.17-3
      amd64 transition libgl.so diversions to
glx-alternative-nvidia
ii libnvidia-ml1:amd64 302.17-3
      amd64 NVIDIA management library (NVML) runtime library
ii nvidia-alternative 302.17-3
      amd64 allows the selection of NVIDIA as GLX provider
ii nvidia-glx 302.17-3
      amd64 NVIDIA metapackage
ii nvidia-installer-cleanup 20120630+3
      amd64 Cleanup after driver installation with the
nvidia-installer
ii nvidia-kernel-common 20120630+3
      amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 302.17-3
      amd64 NVIDIA binary kernel module DKMS source
ii nvidia-smi 302.17-3
      amd64 NVIDIA System Management Interface
ii nvidia-support 20120630+3
      amd64 NVIDIA binary graphics driver support files
ii nvidia-vdpau-driver:amd64 302.17-3
      amd64 NVIDIA vdpau driver
ii nvidia-xconfig 302.17-2
      amd64 X configuration tool for non-free NVIDIA drivers
ii xserver-xorg-video-nvidia 302.17-3
      amd64 NVIDIA binary Xorg driver
francesco_at_gig64:~$

root_at_gig64:/home/francesco# modinfo nvidia
filename: /lib/modules/3.2.0-2-amd64/updates/dkms/nvidia.ko
alias: char-major-195-*
version: 302.17
supported: external
license: NVIDIA
alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias: pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
alias: pci:v000010DEd*sv*sd*bc03sc02i00*
alias: pci:v000010DEd*sv*sd*bc03sc00i00*
depends: i2c-core
vermagic: 3.2.0-2-amd64 SMP mod_unload modversions
parm: NVreg_EnableVia4x:int
parm: NVreg_EnableALiAGP:int
parm: NVreg_ReqAGPRate:int
parm: NVreg_EnableAGPSBA:int
parm: NVreg_EnableAGPFW:int
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_RemapLimit:int
parm: NVreg_UpdateMemoryTypes:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UseVBios:int
parm: NVreg_RMEdgeIntrCheck:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_EnableMSI:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RmMsg:charp
parm: NVreg_NvAGP:int
root_at_gig64:/home/francesco#

I have also tried with recently used MD files, same problem:
francesco_at_gig64:~/tmp$ charmrun namd2 heat-01.conf +p6 +idlepoll 2>&1
| tee heat-01.log
Running command: namd2 heat-01.conf +p6 +idlepoll

Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD CVS-2012-06-20 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Wed Jun 20 02:24:32 CDT 2012 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2012-06-20 Linux-x86_64-multicore-CUDA 6 gig64 francesco
Info: Running on 6 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00989199 s
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
initialization error
------------- Processor 5 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
initialization error

FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
initialization error
Program finished.
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
initialization error
francesco_at_gig64:~/tmp$

This is a shared-mem machine.
Does the version 302.17 work for you?

Thanks
francesco pietra

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:37 CST