AW: problem with runiing namd through infiniband

From: Norman Geist (
Date: Thu May 30 2013 - 05:05:59 CDT

Hi Shubra,


sometime it can help to clear the error from the HCAs. Additionally, most
tools require to start a server at one endpoint and a client on the other.
You can find out the lids with "ibadrr -l" and ibping in client2 with
"ibping -S" and on client2 with "-L lidofclient1".


Norman Geist.


Von: [] Im Auftrag
von Shubhra Ghosh Dastidar
Gesendet: Mittwoch, 29. Mai 2013 14:42
Betreff: Re: namd-l: problem with runiing namd through infiniband


Hi Norman,


I think we still have problem with IB configuration. Because although
ibstat, ibhosts, ibnetdiscover etc are showing OK but ibping is not showing
to ping LID of nodes, not even to self LID, and also ibv_rc_pingpong is
unable to ping localhost. This is a bit confusing to me as the other
commands are working. Since I am configuring IB for the first time I don't
have much clue about its way out.


I will appreciate if anyone can help in this matter.






On Wed, May 29, 2013 at 10:52 AM, Norman Geist
<> wrote:

Hi Shubhra,


if you are sure that you ib fabric setup is fine (do other programs work, do
the tools like ib_ping work), you are maybe using an infiniband stack/driver
that is incompatible with the precompiled builds (not OFED?). You could try
to build namd yourself against an separate MPI (OpenMPI f.i.). Or, if you
have IPoIB installed (check /sbin/ifconfig for interfaces called ib0 or
similar) you can use that interfaces instead of the "eth" ones. Therefore
choose the corresponding ip addresses to the ib network interfaces. Also
when using IPoIB, set /sys/class/net/ib0/mode to "connected" and mtu to
"65520" simply will doing echo with ">" redirect as root. Additionally, also
if you are not using a CUDA version and as long as you use charm++, try to
add +idlepoll when calling namd to improve scaling.


Norman Geist.


Von: [] Im Auftrag
von Shubhra Ghosh Dastidar
Gesendet: Dienstag, 28. Mai 2013 09:15
Betreff: namd-l: problem with runiing namd through infiniband


I am trying to run namd through infiniband.


First I tried the multicore version, which runs smoothly on 32 cores being
restricted within a node.


Then I tried the TCP version (which uses ethernet), which runs across
multiple nodes, e.g. total 32 cores (16 cores from node-1 and 16 cores from


Then I tried the infiniband version and also infiniband-smp version both.
If the job is restricted within the 32 cores on one node then they run

But if it is asked to run across multiple nodes (i.e=

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:14 CST