Re: Problem running NAMD 2.8 with ibverbs

From: Moritz Schlarb (schlarbm_at_uni-mainz.de)
Date: Fri Jan 06 2012 - 09:41:49 CST

Hello again,

I just want to add that it works with the precompiled version from the
homepage (NAMD 2.8 for Linux-x86_64-ibverbs).

Am 06.01.2012 16:29, schrieb Moritz Schlarb:
> Hello everyone,
>
> I'm currently working on deploying NAMD to the linux cluster at the
> Johannes Gutenberg university Mainz, Germany.
>
> I successfully compiled NAMD with the MVAPICH2 MPI and now I wanted to
> compare its speed to a version of NAMD using ibverbs.
>
> I compiled charm++ using the following commandline:
> $ ./build charm++ net-linux-x86_64 ibverbs --no-build-shared
> --with-production
>
> and running the megatest works (nodelist with two nodes):
> $ ./charmrun ++remote-shell ssh +p2 ./pgm
> [...]
> test 53: completed (5.47 sec)
> All tests completed, exiting
>
> Then I configure namd with the following line:
> $ ./config Linux-x86_64-g++ --charm-arch net-linux-x86_64-ibverbs
> which compiles cleanly.
>
> The resulting namd2 executable works fine when I run it locally:
> $ ./namd2 src/alanin
> [...]
> WallClock: 0.035015 CPUTime: 0.010000 Memory: 31.363281 MB
> Program finished.
> $ charmrun ++local +p2 namd2 src/alanin
> [...]
> WallClock: 1.304384 CPUTime: 1.270000 Memory: 70.000000 MB
>
> But when I want to run it on remote nodes (using the same nodelist as
> above), I get a timeout:
> $ ./charmrun ++remote-shell ssh +p2 ++verbose namd2 src/alanin
> [...]
> Charmrun> Waiting for 0-th client to connect.
> Charmrun> error 0 attaching to node:
> Timeout waiting for node-program to connect
>
> When I look at an htop on the remote node, I see some shells spawning
> and exiting.
>
> According to this answer from the mailing list, I already tried using
> ++useip and ++usehostname in the charmrun commandline and specified the
> infiniband ip addresses in the nodelist, but neither of that worked.
>
> I've attached the complete run log and uploaded the tarred namd
> directories (namd_full.tgz is the whole NAMD_2.8._Source dir, namd.tgz
> is only the Linux-x86_64-g++ dir) here:
> https://fileshare.zdv.uni-mainz.de/36d1bd67-45b2-45d4-9c5b-b600d1d28126.repository
>
>
> Thanks in advance,
> Moritz
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:06 CST