AW: how to run NAMD-CUDA on multiple nodes

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed Nov 28 2012 - 01:29:58 CST

Thomas,

Ok, you can run ibverbs binaries without GPU on the same nodes and network?

Basically your setup looks fine regarding IPoIB. So you could also try to
run a non-ibverbs CUDA binary and use IP traffic than. Whats the output of :

cat /sys/class/net/ib0/m*

What happens if you start the run without the runscript? Do you get the
library not found message or something else?

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Thomas Evangelidis
> Gesendet: Dienstag, 27. November 2012 17:13
> An: Norman Geist
> Cc: Namd Mailing List
> Betreff: Re: namd-l: how to run NAMD-CUDA on multiple nodes
>
> Hi Norman,
>
> Thanks for your reply! Below are all the commands starting from "ib"
> that
> are in the PATH:
>
> ib_acme ibdiagui ibis
> ib_read_bw ibtopodiff ibv_srq_pingpong
> ib_write_lat
> ib_clock_test ibdmchk IBMgtSim
> ib_read_lat ibv_asyncwatch ibv_uc_pingpong
> ibdev2netdev ibdmsh ibmsquit
> ib_send_bw ibv_devices ibv_ud_pingpong
> ibdiagnet ibdmtr ibmssh
> ib_send_lat ibv_devinfo ib_write_bw
> ibdiagpath ibdump ibnlparse
> ibsim ibv_rc_pingpong ib_write_bw_postlist
>
> And the output of /sbin/ifconfig:
>
> ib0 Link encap:InfiniBand HWaddr
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:172.31.103.1 Bcast:172.31.255.255
> Mask:255.255.0.0
> inet6 addr: fe80::202:c903:10:56af/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
> RX packets:606075818 errors:0 dropped:0 overruns:0 frame:0
> TX packets:622801559 errors:0 dropped:116 overruns:0
> carrier:0
> collisions:0 txqueuelen:256
> RX bytes:103873275302 (96.7 GiB) TX bytes:164334746947
> (153.0
> GiB)
>
> ib0:0 Link encap:InfiniBand HWaddr
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:10.20.100.1 Bcast:10.20.100.255
> Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
>
> ib0:1 Link encap:InfiniBand HWaddr
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> inet addr:10.20.100.4 Bcast:10.20.100.255
> Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
>
> There also exists a /sys/class/net/ib0/ folder. I am able to run the
> ibverbs NAMD version I compiled on multiple nodes without the GPUs with
> great speed gains. There are no "ib_ping", "ibhosts" or "ibnodes"
> commands
> on the login server as well as on the nodes. I have also written a
> runscript.sh script for bash but yields the same error.
>
> How do you find out if IPoIB is properly set up? If you can't work out
> what
> may be wrong please give me some instructions about what to ask from
> the
> cluster administrators.
>
> many thanks,
> Thomas
>
>
>
>
> On 27 November 2012 16:57, Norman Geist <norman.geist_at_uni-
> greifswald.de>wrote:
>
> > Hi,****
> >
> > ** **
> >
> > seems to be a different problem. Try /sbin/ifconfig.****
> >
> > ** **
> >
> > If there is really no ifconfig, check if the folder
> /sys/class/net/ib0/
> > exist. This will also show if you already have IPoIB installed and
> loaded=

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:47 CST