From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Sep 27 2012 - 02:52:11 CDT
Hello:
I have tried the NAMD_CVS-2012-09-26_Linux-x86_64-multicore-CUDA with
nvidia version 302.17:
Running command: namd2 heat-01.conf +p6 +idlepoll
Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD CVS-2012-09-26 for Linux-x86_64-multicore-CUDA
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: for updates, documentation, and support information.
Info:
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
Info: Built Wed Sep 26 02:25:08 CDT 2012 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2012-09-26  Linux-x86_64-multicore-CUDA  6    gig64  francesco
Info: Running on 6 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.085423 s
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
initialization error
------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
initialization error
Program finished.
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
initialization error
FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
initialization error
As I had (nearly) no comment to such failures, I can only imagine that
either (i) my question - disregarding obvious issues - was too silly
to merit attention; (ii) it is well known that nvidia version 302.17
is incompatible with current namd builds for Linux-GNU.
At any event, in the frame of metapackages, it is probably impossible
within Debian GNU-Linux wheezy to go back to a previous version of
nvidia. On the other hand, the stable version of the OS furnishes a
much too old version of nvidia. Therefore, my question is:
Any chance to compile namd in front of installed nvidia version 302.17?
Thanks for advice. Without access to namd-cuda I am currently hindered
to answer a question raised by the reviewers of a manuscript (the CPU
cluster has long ago been shut down, as it became too expensive for
our budget)
francesco pietra
On Wed, Sep 26, 2012 at 4:08 PM, Francesco Pietra <chiendarret_at_gmail.com> wrote:
> I forgot to mention that I am at final version 2.9 of namd.
> f.
>
> On Wed, Sep 26, 2012 at 4:05 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>> I'm not certain, but I think the driver version needs to match the CUDA
>> toolkit version that NAMD uses, and I think the library file NAMD comes with
>> is toolkit 4.0 or something of that sort.
>>
>> ~Aron
>>
>>
>> On Wed, Sep 26, 2012 at 9:58 AM, Francesco Pietra <chiendarret_at_gmail.com>
>> wrote:
>>>
>>> Hi:
>>> Following updating/upgrading of Debian GNU-Linux amd64 wheezy,
>>> minimizations do not run anymore on GTX-680:
>>>
>>> CUDA error in CudaGetDeviceCount on Pe3 Pe4, Pe6. Initialization error.
>>>
>>> The two GTX are regularly activated with
>>> nvidia-smi -L
>>> nvidia-smi -pm 1
>>>
>>> Server and nvidia are the same version:
>>>
>>> francesco_at_gig64:~$ dpkg -l |grep nvidia
>>> ii  glx-alternative-nvidia                0.2.2
>>>       amd64        allows the selection of NVIDIA as GLX provider
>>> ii  libgl1-nvidia-alternatives            302.17-3
>>>       amd64        transition libGL.so* diversions to
>>> glx-alternative-nvidia
>>> ii  libgl1-nvidia-glx:amd64               302.17-3
>>>       amd64        NVIDIA binary OpenGL libraries
>>> ii  libglx-nvidia-alternatives            302.17-3
>>>       amd64        transition libgl.so diversions to
>>> glx-alternative-nvidia
>>> ii  libnvidia-ml1:amd64                   302.17-3
>>>       amd64        NVIDIA management library (NVML) runtime library
>>> ii  nvidia-alternative                    302.17-3
>>>       amd64        allows the selection of NVIDIA as GLX provider
>>> ii  nvidia-glx                            302.17-3
>>>       amd64        NVIDIA metapackage
>>> ii  nvidia-installer-cleanup              20120630+3
>>>       amd64        Cleanup after driver installation with the
>>> nvidia-installer
>>> ii  nvidia-kernel-common                  20120630+3
>>>       amd64        NVIDIA binary kernel module support files
>>> ii  nvidia-kernel-dkms                    302.17-3
>>>       amd64        NVIDIA binary kernel module DKMS source
>>> ii  nvidia-smi                            302.17-3
>>>       amd64        NVIDIA System Management Interface
>>> ii  nvidia-support                        20120630+3
>>>       amd64        NVIDIA binary graphics driver support files
>>> ii  nvidia-vdpau-driver:amd64             302.17-3
>>>       amd64        NVIDIA vdpau driver
>>> ii  nvidia-xconfig                        302.17-2
>>>       amd64        X configuration tool for non-free NVIDIA drivers
>>> ii  xserver-xorg-video-nvidia             302.17-3
>>>       amd64        NVIDIA binary Xorg driver
>>> francesco_at_gig64:~$
>>>
>>>
>>> root_at_gig64:/home/francesco# modinfo nvidia
>>> filename:       /lib/modules/3.2.0-2-amd64/updates/dkms/nvidia.ko
>>> alias:          char-major-195-*
>>> version:        302.17
>>> supported:      external
>>> license:        NVIDIA
>>> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>>> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>>> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>>> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>>> depends:        i2c-core
>>> vermagic:       3.2.0-2-amd64 SMP mod_unload modversions
>>> parm:           NVreg_EnableVia4x:int
>>> parm:           NVreg_EnableALiAGP:int
>>> parm:           NVreg_ReqAGPRate:int
>>> parm:           NVreg_EnableAGPSBA:int
>>> parm:           NVreg_EnableAGPFW:int
>>> parm:           NVreg_Mobile:int
>>> parm:           NVreg_ResmanDebugLevel:int
>>> parm:           NVreg_RmLogonRC:int
>>> parm:           NVreg_ModifyDeviceFiles:int
>>> parm:           NVreg_DeviceFileUID:int
>>> parm:           NVreg_DeviceFileGID:int
>>> parm:           NVreg_DeviceFileMode:int
>>> parm:           NVreg_RemapLimit:int
>>> parm:           NVreg_UpdateMemoryTypes:int
>>> parm:           NVreg_InitializeSystemMemoryAllocations:int
>>> parm:           NVreg_UseVBios:int
>>> parm:           NVreg_RMEdgeIntrCheck:int
>>> parm:           NVreg_UsePageAttributeTable:int
>>> parm:           NVreg_EnableMSI:int
>>> parm:           NVreg_MapRegistersEarly:int
>>> parm:           NVreg_RegisterForACPIEvents:int
>>> parm:           NVreg_RegistryDwords:charp
>>> parm:           NVreg_RmMsg:charp
>>> parm:           NVreg_NvAGP:int
>>> root_at_gig64:/home/francesco#
>>>
>>> I have also tried with recently used MD files, same problem:
>>> francesco_at_gig64:~/tmp$ charmrun namd2 heat-01.conf +p6 +idlepoll 2>&1
>>> | tee heat-01.log
>>> Running command: namd2 heat-01.conf +p6 +idlepoll
>>>
>>> Charm++: standalone mode (not using charmrun)
>>> Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
>>> CharmLB> Load balancer assumes all CPUs are same.
>>> Charm++> Running on 1 unique compute nodes (12-way SMP).
>>> Charm++> cpu topology info is gathered in 0.001 seconds.
>>> Info: NAMD CVS-2012-06-20 for Linux-x86_64-multicore-CUDA
>>> Info:
>>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>>> Info: for updates, documentation, and support information.
>>> Info:
>>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
>>> Info: in all publications reporting results obtained with NAMD.
>>> Info:
>>> Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
>>> Info: Built Wed Jun 20 02:24:32 CDT 2012 by jim on lisboa.ks.uiuc.edu
>>> Info: 1 NAMD  CVS-2012-06-20  Linux-x86_64-multicore-CUDA  6    gig64
>>> francesco
>>> Info: Running on 6 processors, 1 nodes, 1 physical nodes.
>>> Info: CPU topology information available.
>>> Info: Charm++/Converse parallel runtime startup completed at 0.00989199 s
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
>>> initialization error
>>> ------------- Processor 5 Exiting: Called CmiAbort ------------
>>> Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64):
>>> initialization error
>>>
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64):
>>> initialization error
>>> Program finished.
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64):
>>> initialization error
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64):
>>> initialization error
>>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64):
>>> initialization error
>>> francesco_at_gig64:~/tmp$
>>>
>>>
>>> This is a shared-mem machine.
>>> Does the version 302.17 work for you?
>>>
>>> Thanks
>>> francesco pietra
>>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:37 CST