AW: AW: namd-ibverbs fails to start

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Nov 22 2011 - 00:49:46 CST

NOW I see! 10Gbit/s-Ethernet is NOT Infiniband! What you need to run RDMA
over Ethernet is OpenMX (mMyrinetExpress). You can get from the net. But
it's not easy to get it working.

Good luck

Norman Geist.

-----Ursprüngliche Nachricht-----
Von: David Hemmendinger [mailto:hemmendd_at_union.edu]
Gesendet: Montag, 21. November 2011 17:04
An: norman.geist_at_uni-greifswald.de
Betreff: Re: AW: namd-l: namd-ibverbs fails to start

        I think that I wasn't clear enough, not being very familiar
with ibverbs infiniband. We are using RoCE, which, as I understand it,
runs the infiniband protocol over tcp. The IBM tech who installed it
on our cluster ran tests, which it passed, and I'm able to use it with
OpenMPI 1.4.2. My comment about the failure of ibv_rc_pingpong referred
only to what happened before I knew that I needed to specify -g GID,
a recently-added feature, according to Mellanox documentation.
        So my question was whether this new feature would mean that
I'd need to modify an ibverbs call in charmrun -- since we're running
over 10GB ethernet, we don't have an ib0.
        Thanks,
          David

>this doesn't seem to be a problem of namd or charmrun, rather than a
problem
>of your infiniband configuration/installation. If the tests shipped with
>ofed fail, then there's something wrong. If you don't want to spend too
much
>time with the problem, use the ipoib driver to use the infiniband with ip
>traffic. Then u can just use a udp (faster) or tcp (NET version, usually
>slower than udp) version of namd over the ip over infiniband stack which
for
>me was faster than the native verbs. Another advantage is that you can use
>every possible mpi application like that also. Keep in mind to change the
>connection mode to connected, _not_ datagram and set the mtu to 65520.
>
>$> echo connected > /sys/class/net/ib0/mode
>$> ifconfig ib0 mtu 65520
>
>If that doesn't work also, something is wrong with your ofed or infiniband,
>then we can look further

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:00 CST