From: Jyh-Shyong (c00jsh00_at_nchc.org.tw)
Date: Sun Dec 20 2009 - 23:03:37 CST
Hi,
I tried NAMD 2.7b2 on our GPU cluster, but so far I have not been
successful, any hint and suggestion
is appreciated.
1. I download the binary NAMD_2.7b2_Linux-x86_64-ibverbs-CUDA, and ran
a test case with command
./charmrun ++local ++p 4 namd2 +idlepoll ./alanin.namd
Charmrun> IBVERBS version of charmrun
Charmrun: Bad initnode data length. Aborting
2. I tried again with command
./charmrun ++nodelist ./hostlist ++p 4 namd2 +idlepoll ./alanin.namd
Here file hostlist contains two lines:
group main
host gc16
gc16 is the hostname of the computer I was using. Here is the output of
this command:
..
Info:
Info: Entering startup at 0.376303 s, 104.066 MB of memory in use
Info: Startup phase 0 took 0.00472808 s, 104.066 MB of memory in use
Info: Startup phase 1 took 0.00161982 s, 104.066 MB of memory in use
Info: Startup phase 2 took 0.000169039 s, 104.066 MB of memory in use
FATAL ERROR: CUDA-enabled NAMD requires more patches than processes.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA-enabled NAMD requires more patches than processes.
[0] Stack Traceback:
[0] CmiAbort+0x5f [0x9f6257]
[1] _Z8NAMD_diePKc+0x62 [0x50ad52]
[2] _ZN11WorkDistrib12patchMapInitEv+0x8e6 [0x8ecf46]
[3] _ZN4Node7startupEv+0xd4d [0x8497fb]
[4] _ZN12CkIndex_Node18_call_startup_voidEPvP4Node+0x12 [0x848aaa]
[5] CkDeliverMessageFree+0x21 [0x96a2d5]
[6] _Z15_processHandlerPvP11CkCoreState+0x4ba [0x96994a]
[7] CsdScheduleForever+0xa5 [0x9f706d]
[8] CsdScheduler+0x1c [0x9f6c6e]
[9] _ZN7BackEnd7suspendEv+0xb [0x5138cd]
[10] _ZN9ScriptTcl9initcheckEv+0x80 [0x8aeaaa]
[11] _ZN9ScriptTcl3runEPc+0xb5 [0x8aac23]
[12] _Z18after_backend_initiPPc+0x22b [0x50f56b]
[13] main+0x3a [0x50f30a]
[14] __libc_start_main+0xe6 [0x7f17e1e61586]
[15] _ZNSt8ios_base4InitD1Ev+0x72 [0x50a6ca]
Fatal error on PE 0> FATAL ERROR: CUDA-enabled NAMD requires more
patches than processes.
There are 4 Tesla C1070s on this node:
chem_at_gc16:/work/chem/alanin> ls -l /dev/nvi*
crw-rw-rw- 1 root video 195, 0 2009-10-13 10:56 /dev/nvidia0
crw-rw-rw- 1 root video 195, 1 2009-10-13 10:56 /dev/nvidia1
crw-rw-rw- 1 root video 195, 2 2009-10-13 10:56 /dev/nvidia2
crw-rw-rw- 1 root video 195, 3 2009-10-13 10:56 /dev/nvidia3
crw-rw-rw- 1 root video 195, 255 2009-10-13 10:56 /dev/nvidiactl
I wonder something in my environment settings might be wrong, but I
don't know what it is.
I also downloaded the latest version of source code and built the binary
with ibverbs option
for charm, and I got the same result.
Regards
Jyh-Shyong Ho, Ph.D.
Research Scientist
National Center for High Performance Computing
Hsinchu, Taiwan, ROC
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:37 CST