Re: namd on nvidia 302.17

From: Aron Broom (broomsday_at_gmail.com)
Date: Wed Sep 26 2012 - 09:05:07 CDT

I'm not certain, but I think the driver version needs to match the CUDA
toolkit version that NAMD uses, and I think the library file NAMD comes
with is toolkit 4.0 or something of that sort.

~Aron

On Wed, Sep 26, 2012 at 9:58 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:

> Hi:
> Following updating/upgrading of Debian GNU-Linux amd64 wheezy,
> minimizations do not run anymore on GTX-680:
>
> CUDA error in CudaGetDeviceCount on Pe3 Pe4, Pe6. Initialization error.
>
> The two GTX are regularly activated with
> nvidia-smi -L
> nvidia-smi -pm 1
>
> Server and nvidia are the same version:
>
> francesco_at_gig64:~$ dpkg -l |grep nvidia
> ii glx-alternative-nvidia 0.2.2
> amd64 allows the selection of NVIDIA as GLX provider
> ii libgl1-nvidia-alternatives 302.17-3
> amd64 transition libGL.so* diversions to
> glx-alternative-nvidia
> ii libgl1-nvidia-glx:amd64 302.17-3
> amd64 NVIDIA binary OpenGL libraries
> ii libglx-nvidia-alternatives 302.17-3
> amd64 transition libgl.so diversions to
> glx-alternative-nvidia
> ii libnvidia-ml1:amd64 302.17-3
> amd64 NVIDIA management library (NVML) runtime library
> ii nvidia-alternative 302.17-3
> amd64 allows the selection of NVIDIA as GLX provider
> ii nvidia-glx 302.17-3
> amd64 NVIDIA metapackage
> ii nvidia-installer-cleanup 20120630+3
> amd64 Cleanup after driver installation with the
> nvidia-installer
> ii nvidia-kernel-common 20120630+3
> amd64 NVIDIA binary kernel module support files
> ii nvidia-kernel-dkms 302.17-3
> amd64 NVIDIA binary kernel module DKMS source
> ii nvidia-smi 302.17-3
> amd64 NVIDIA System Management Interface
> ii nvidia-support 20120630+3
> amd64 NVIDIA binary graphics driver support files
> ii nvidia-vdpau-driver:amd64 302.17-3
> amd64 NVIDIA vdpau driver
> ii nvidia-xconfig 302.17-2
> amd64 X configuration tool for non-free NVIDIA drivers
> ii xserver-xorg-video-nvidia 302.17-3
> amd64 NVIDIA binary Xorg driver
> francesco_at_gig64:~$
>
>
> root_at_gig64:/home/francesco# modinfo nvidia
> filename: /lib/modules/3.2.0-2-amd64/updates/dkms/nvidia.ko
> alias: char-major-195-*
> version: 302.17
> supported: external
> license: NVIDIA
> alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00*
> alias: pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
> alias: pci:v000010DEd*sv*sd*bc03sc02i00*
> alias: pci:v000010DEd*sv*sd*bc03sc00i00*
> depends: i2c-core
> vermagic: 3.2.0-2-amd64 SMP mod_unload modversions
> parm: NVreg_EnableVia4x:int
> parm: NVreg_EnableALiAGP:int
> parm: NVreg_ReqAGPRate:int
> parm: NVreg_EnableAGPSBA:int
> parm: NVreg_EnableAGPFW:int
> parm: NVreg_Mobile:int
> parm: NVreg_ResmanDebugLevel:int
> parm: NVreg_RmLogonRC:int
> parm: NVreg_ModifyDeviceFiles:int
> parm: NVreg_DeviceFileUID:int
> parm: NVreg_DeviceFileGID:int
> parm: NVreg_DeviceFileMode:int
> parm: NVreg_RemapLimit:int
> parm: NVreg_UpdateMemoryTypes:int
> parm: NVreg_InitializeSystemMemoryAllocations:int
> parm: NVreg_UseVBios:int
> parm: NVreg_RMEdgeIntrCheck:int
> parm: NVreg_UsePageAttributeTable:int
> parm: NVreg_EnableMSI:int
> parm: NVreg_MapRegistersEarly:int
> parm: NVreg_RegisterForACPIEvents:int
> parm: NVreg_RegistryDwords:charp
> parm: NVreg_RmMsg:charp
> parm: NVreg_NvAGP:int
> root_at_gig64:/home/francesco#
>
> I have also tried with recently used MD files, same problem:
> francesco_at_gig64:~/tmp$ charmrun namd2 heat-01.conf +p6 +idlepoll 2>&1
> | tee heat-01.log
> Running command: namd2 heat-01.conf +p6 +idlepoll
>
> Charm++: standalone mode (not using charmrun)
> Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 unique compute nodes (12-way SMP).
> Charm++> cpu topology info is gathered in 0.001 seconds.
> Info: NAMD CVS-2012-06-20 for Linux-x86_64-multicore-CUDA
> Info:
> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
> Info: for updates, documentation, and support information.
> Info:
> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
> Info: in all publications reporting results obtained with NAMD.
> Info:
> Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
> Info: Built Wed Jun 20 02:24:32 CDT 2012 by jim on lisboa.ks.uiuc.edu
> Info: 1 NAMD CVS-2012-06-20 Linux-x86_64-multicore-CUDA 6 gig64
> francesco
> Info: Running on 6 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00989199 s
> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
> initialization error
> ------------- Processor 5 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
> initialization error
>
> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
> initialization error
> Program finished.
> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
> initialization error
> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
> initialization error
> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
> initialization error
> francesco_at_gig64:~/tmp$
>
>
> This is a shared-mem machine.
> Does the version 302.17 work for you?
>
> Thanks
> francesco pietra
>
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:07 CST