From: Aravinda Munasinghe (aravinda1879_at_gmail.com)
Date: Wed Jan 16 2019 - 20:16:26 CST
Dear Namd users,
Since REST2 have a bug and it was fixed in nightly version, I tried
compiling namd2.13 from the source code. However, I failed as I was not
able to successfully compile charm++.
After following steps mentioned in
https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/namd/
for namd multiple node compilation, I am getting the timeout error with
charmrun.
I even tried running simplearrayhello with the following command
./charmrun +p4 ++runscript ./runscript ++verbose ++nodelist
nodelist.30890720 ++remote-shell ssh ./hello
and getting the following error
Charmrun> scalable start enabled.
Charmrun> charmrun started...
Charmrun> using nodelist.30890720 as nodesfile
Charmrun> adding client 0: "dev1-ib", IP:10.13.142.29
Charmrun> adding client 1: "dev1-ib", IP:10.13.142.29
Charmrun> adding client 2: "dev2-ib", IP:10.13.142.30
Charmrun> adding client 3: "dev2-ib", IP:10.13.142.30
Charmrun> Charmrun = 172.16.206.30, port = 34700
Charmrun> IBVERBS version of charmrun
start_nodes_ssh
Charmrun> Sending "0 172.16.206.30 34700 10410 0" to client 0.
Charmrun> find the node program
"/home/aravinda1879/progs/NAMD_Git-2019-01-16_Source/charm-6.8.2/verbs-linux-x86_64-smp-icc/tests/charm++/simplearrayhello/./hello"
at
"/home/aravinda1879/progs/NAMD_Git-2019-01-16_Source/charm-6.8.2/verbs-linux-x86_64-smp-icc/tests/charm++/simplearrayhello"
for 0.
Charmrun> Starting ssh dev1-ib -l aravinda1879 -o
KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o
NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> remote shell (dev1-ib:0) started
Charmrun> Sending "2 172.16.206.30 34700 10410 0" to client 2.
Charmrun> find the node program
"/home/aravinda1879/progs/NAMD_Git-2019-01-16_Source/charm-6.8.2/verbs-linux-x86_64-smp-icc/tests/charm++/simplearrayhello/./hello"
at
"/home/aravinda1879/progs/NAMD_Git-2019-01-16_Source/charm-6.8.2/verbs-linux-x86_64-smp-icc/tests/charm++/simplearrayhello"
for 2.
Charmrun> Starting ssh dev2-ib -l aravinda1879 -o
KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o
NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> remote shell (dev2-ib:2) started
Charmrun> node programs all started
Charmrun remote shell(dev2-ib.2)> remote responding...
Charmrun remote shell(dev2-ib.2)> starting node-program...
Charmrun remote shell(dev2-ib.2)> remote shell phase successful.
Charmrun remote shell(dev1-ib.0)> remote responding...
Charmrun remote shell(dev1-ib.0)> starting node-program...
Charmrun remote shell(dev1-ib.0)> remote shell phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> error attaching to node 'dev1-ib':
Timeout waiting for node-program to connect
Without any trouble, I was able to run
precompiled Linux-x86_64-verbs-smp-CUDA in multiple node GPU for replica
exchange simulations. I saw this problem has been asked by several threads
and I did try their suggestions. But none worked.
-- Aravinda Munasinghe,
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:09 CST