Re: how to run NAMD-CUDA on multiple nodes

From: Thomas Evangelidis (tevang3_at_gmail.com)
Date: Wed Nov 28 2012 - 01:45:14 CST

Hi Norman,

$ cat /sys/class/net/ib0/m*
connected
4096

Ok, you can run ibverbs binaries without GPU on the same nodes and network?
>

Yes

> Basically your setup looks fine regarding IPoIB. So you could also try to
> run a non-ibverbs CUDA binary and use IP traffic than. Whats the output of
> :
>
> cat /sys/class/net/ib0/m*
>
> $ cat /sys/class/net/ib0/m*
connected
4096

I currently can run NAMD-CUDA using a net-linux-x86_64-ifort-smp-icc binary
I compiled, not the ibverbs binary. But I don't get any performance gains
if I run it on multiple nodes with GPUs, the speed remains almost the same.

What happens if you start the run without the runscript? Do you get the
> library not found message or something else?
>
>
I get that message about libcudart.so.4 not found.

thanks,
Thomas

> Norman Geist.
>
> > -----Ursprüngliche Nachricht-----
> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> > Auftrag von Thomas Evangelidis
> > Gesendet: Dienstag, 27. November 2012 17:13
> > An: Norman Geist
> > Cc: Namd Mailing List
> > Betreff: Re: namd-l: how to run NAMD-CUDA on multiple nodes
> >
> > Hi Norman,
> >
> > Thanks for your reply! Below are all the commands starting from "ib"
> > that
> > are in the PATH:
> >
> > ib_acme ibdiagui ibis
> > ib_read_bw ibtopodiff ibv_srq_pingpong
> > ib_write_lat
> > ib_clock_test ibdmchk IBMgtSim
> > ib_read_lat ibv_asyncwatch ibv_uc_pingpong
> > ibdev2netdev ibdmsh ibmsquit
> > ib_send_bw ibv_devices ibv_ud_pingpong
> > ibdiagnet ibdmtr ibmssh
> > ib_send_lat ibv_devinfo ib_write_bw
> > ibdiagpath ibdump ibnlparse
> > ibsim ibv_rc_pingpong ib_write_bw_postlist
> >
> > And the output of /sbin/ifconfig:
> >
> > ib0 Link encap:InfiniBand HWaddr
> > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> > inet addr:172.31.103.1 Bcast:172.31.255.255
> > Mask:255.255.0.0
> > inet6 addr: fe80::202:c903:10:56af/64 Scope:Link
> > UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
> > RX packets:606075818 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:622801559 errors:0 dropped:116 overruns:0
> > carrier:0
> > collisions:0 txqueuelen:256
> > RX bytes:103873275302 (96.7 GiB) TX bytes:164334746947
> > (153.0
> > GiB)
> >
> > ib0:0 Link encap:InfiniBand HWaddr
> > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> > inet addr:10.20.100.1 Bcast:10.20.100.255
> > Mask:255.255.255.0
> > UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
> >
> > ib0:1 Link encap:InfiniBand HWaddr
> > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
> > inet addr:10.20.100.4 Bcast:10.20.100.255
> > Mask:255.255.255.0
> > UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1
> >
> > There also exists a /sys/class/net/ib0/ folder. I am able to run the
> > ibverbs NAMD version I compiled on multiple nodes without the GPUs with
> > great speed gains. There are no "ib_ping", "ibhosts" or "ibnodes"
> > commands
> > on the login server as well as on the nodes. I have also written a
> > runscript.sh script for bash but yields the same error.
> >
> > How do you find out if IPoIB is properly set up? If you can't work out
> > what
> > may be wrong please give me some instructions about what to ask from
> > the
> > cluster administrators.
> >
> > many thanks,
> > Thomas
> >
> >
> >
> >
> > On 27 November 2012 16:57, Norman Geist <norman.geist_at_uni-
> > greifswald.de>wrote:
> >
> > > Hi,****
> > >
> > > ** **
> > >
> > > seems to be a different problem. Try /sbin/ifconfig.****
> > >
> > > ** **
> > >
> > > If there is really no ifconfig, check if the folder
> > /sys/class/net/ib0/
> > > exist. This will also show if you already have IPoIB installed and
> > loaded=
>
>

-- 
======================================================================
Thomas Evangelidis
PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
Panepistimioupoli-Zografou
157 71 Athens
GREECE
email: tevang_at_pharm.uoa.gr
          tevang3_at_gmail.com
website: https://sites.google.com/site/thomasevangelidishomepage/

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:18 CST