Re: namd on nvidia 302.17

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Sep 27 2012 - 02:52:11 CDT

Hello:
I have tried the NAMD_CVS-2012-09-26_Linux-x86_64-multicore-CUDA with
nvidia version 302.17:

Running command: namd2 heat-01.conf +p6 +idlepoll

Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD CVS-2012-09-26 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Wed Sep 26 02:25:08 CDT 2012 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2012-09-26 Linux-x86_64-multicore-CUDA 6 gig64 francesco
Info: Running on 6 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.085423 s
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
initialization error
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
initialization error

Program finished.
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
initialization error

As I had (nearly) no comment to such failures, I can only imagine that
either (i) my question - disregarding obvious issues - was too silly
to merit attention; (ii) it is well known that nvidia version 302.17
is incompatible with current namd builds for Linux-GNU.

At any event, in the frame of metapackages, it is probably impossible
within Debian GNU-Linux wheezy to go back to a previous version of
nvidia. On the other hand, the stable version of the OS furnishes a
much too old version of nvidia. Therefore, my question is:

Any chance to compile namd in front of installed nvidia version 302.17?

Thanks for advice. Without access to namd-cuda I am currently hindered
to answer a question raised by the reviewers of a manuscript (the CPU
cluster has long ago been shut down, as it became too expensive for
our budget)

francesco pietra

On Wed, Sep 26, 2012 at 4:08 PM, Francesco Pietra <chiendarret_at_gmail.com> wrote:
> I forgot to mention that I am at final version 2.9 of namd.
> f.
>
> On Wed, Sep 26, 2012 at 4:05 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>> I'm not certain, but I think the driver version needs to match the CUDA
>> toolkit version that NAMD uses, and I think the library file NAMD comes with
>> is toolkit 4.0 or something of that sort.
>>
>> ~Aron
>>
>>
>> On Wed, Sep 26, 2012 at 9:58 AM, Francesco Pietra <chiendarret_at_gmail.com>
>> wrote:
>>>
>>> Hi:
>>> Following updating/upgrading of Debian GNU-Linux amd64 wheezy,
>>> minimizations do not run anymore on GTX-680:
>>>
>>> CUDA error in CudaGetDeviceCount on Pe3 Pe4, Pe6. Initialization error.
>>>
>>> The two GTX are regularly activated with
>>> nvidia-smi -L
>>> nvidia-smi -pm 1
>>>
>>> Server and nvidia are the same version:
>>>
>>> francesco_at_gig64:~$ dpkg -l |grep nvidia
>>> ii glx-alternative-nvidia 0.2.2
>>> amd64 allows the selection of NVIDIA as GLX provider
>>> ii libgl1-nvidia-alternatives 302.17-3
>>> amd64 transition libGL.so* diversions to
>>> glx-alternative-nvidia
>>> ii libgl1-nvidia-glx:amd64 302.17-3
>>> amd64 NVIDIA binary OpenGL libraries
>>> ii libglx-nvidia-alternatives 302.17-3
>>> amd64 transition libgl.so diversions to
>>> glx-alternative-nvidia
>>> ii libnvidia-ml1:amd64 302.17-3
>>> amd64 NVIDIA management library (NVML) runtime library
>>> ii nvidia-alternative 302.17-3
>>> amd64 allows the selection of NVIDIA as GLX provider
>>> ii nvidia-glx 302.17-3
>>> amd64 NVIDIA metapackage
>>> ii nvidia-installer-cleanup 20120630+3
>>> amd64 Cleanup after driver installation with the
>>> nvidia-installer
>>> ii nvidia-kernel-common 20120630+3
>>> amd64 NVIDIA binary kernel module support files
>>> ii nvidia-kernel-dkms 302.17-3
>>> amd64 NVIDIA binary kernel module DKMS source
>>> ii nvidia-smi 302.17-3
>>> amd64 NVIDIA System Management Interface
>>> ii nvidia-support 20120630+3
>>> amd64 NVIDIA binary graphics driver support files
>>> ii nvidia-vdpau-driver:amd64 302.17-3
>>> amd64 NVIDIA vdpau driver
>>> ii nvidia-xconfig 302.17-2
>>> amd64 X configuration tool for non-free NVIDIA drivers
>>> ii xserver-xorg-video-nvidia 302.17-3
>>> amd64 NVIDIA binary Xorg driver
>>> francesco_at_gig64:~$
>>>
>>>
>>> root_at_gig64:/home/francesco# modinfo nvidia
>>> filename: /lib/modules/3.2.0-2-amd64/updates/dkms/nvidia.ko
>>> alias: char-major-195-*
>>> version: 302.17
>>> supported: external
>>> license: NVIDIA
>>> alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>>> alias: pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>>> alias: pci:v000010DEd*sv*sd*bc03sc02i00*
>>> alias: pci:v000010DEd*sv*sd*bc03sc00i00*
>>> depends: i2c-core
>>> vermagic: 3.2.0-2-amd64 SMP mod_unload modversions
>>> parm: NVreg_EnableVia4x:int
>>> parm: NVreg_EnableALiAGP:int
>>> parm: NVreg_ReqAGPRate:int
>>> parm: NVreg_EnableAGPSBA:int
>>> parm: NVreg_EnableAGPFW:int
>>> parm: NVreg_Mobile:int
>>> parm: NVreg_ResmanDebugLevel:int
>>> parm: NVreg_RmLogonRC:int
>>> parm: NVreg_ModifyDeviceFiles:int
>>> parm: NVreg_DeviceFileUID:int
>>> parm: NVreg_DeviceFileGID:int
>>> parm: NVreg_DeviceFileMode:int
>>> parm: NVreg_RemapLimit:int
>>> parm: NVreg_UpdateMemoryTypes:int
>>> parm: NVreg_InitializeSystemMemoryAllocations:int
>>> parm: NVreg_UseVBios:int
>>> parm: NVreg_RMEdgeIntrCheck:int
>>> parm: NVreg_UsePageAttributeTable:int
>>> parm: NVreg_EnableMSI:int
>>> parm: NVreg_MapRegistersEarly:int
>>> parm: NVreg_RegisterForACPIEvents:int
>>> parm: NVreg_RegistryDwords:charp
>>> parm: NVreg_RmMsg:charp
>>> parm: NVreg_NvAGP:int
>>> root_at_gig64:/home/francesco#
>>>
>>> I have also tried with recently used MD files, same problem:
>>> francesco_at_gig64:~/tmp$ charmrun namd2 heat-01.conf +p6 +idlepoll 2>&1
>>> | tee heat-01.log
>>> Running command: namd2 heat-01.conf +p6 +idlepoll
>>>
>>> Charm++: standalone mode (not using charmrun)
>>> Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
>>> CharmLB> Load balancer assumes all CPUs are same.
>>> Charm++> Running on 1 unique compute nodes (12-way SMP).
>>> Charm++> cpu topology info is gathered in 0.001 seconds.
>>> Info: NAMD CVS-2012-06-20 for Linux-x86_64-multicore-CUDA
>>> Info:
>>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>>> Info: for updates, documentation, and support information.
>>> Info:
>>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>>> Info: in all publications reporting results obtained with NAMD.
>>> Info:
>>> Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
>>> Info: Built Wed Jun 20 02:24:32 CDT 2012 by jim on lisboa.ks.uiuc.edu
>>> Info: 1 NAMD CVS-2012-06-20 Linux-x86_64-multicore-CUDA 6 gig64
>>> francesco
>>> Info: Running on 6 processors, 1 nodes, 1 physical nodes.
>>> Info: CPU topology information available.
>>> Info: Charm++/Converse parallel runtime startup completed at 0.00989199 s
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
>>> initialization error
>>> ------------- Processor 5 Exiting: Called CmiAbort ------------
>>> Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
>>> initialization error
>>>
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
>>> initialization error
>>> Program finished.
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
>>> initialization error
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
>>> initialization error
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
>>> initialization error
>>> francesco_at_gig64:~/tmp$
>>>
>>>
>>> This is a shared-mem machine.
>>> Does the version 302.17 work for you?
>>>
>>> Thanks
>>> francesco pietra
>>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:37 CST