From: Thomas Evangelidis (tevang3_at_gmail.com)
Date: Tue Nov 27 2012 - 10:13:07 CST
Hi Norman,
Thanks for your reply! Below are all the commands starting from "ib" that
are in the PATH:
ib_acme               ibdiagui              ibis
ib_read_bw            ibtopodiff            ibv_srq_pingpong
ib_write_lat
ib_clock_test         ibdmchk               IBMgtSim
ib_read_lat           ibv_asyncwatch        ibv_uc_pingpong
ibdev2netdev          ibdmsh                ibmsquit
ib_send_bw            ibv_devices           ibv_ud_pingpong
ibdiagnet             ibdmtr                ibmssh
ib_send_lat           ibv_devinfo           ib_write_bw
ibdiagpath            ibdump                ibnlparse
ibsim                 ibv_rc_pingpong       ib_write_bw_postlist
And the output of /sbin/ifconfig:
ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.31.103.1  Bcast:172.31.255.255  Mask:255.255.0.0
          inet6 addr: fe80::202:c903:10:56af/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:4096  Metric:1
          RX packets:606075818 errors:0 dropped:0 overruns:0 frame:0
          TX packets:622801559 errors:0 dropped:116 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:103873275302 (96.7 GiB)  TX bytes:164334746947 (153.0
GiB)
ib0:0     Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:10.20.100.1  Bcast:10.20.100.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:4096  Metric:1
ib0:1     Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:10.20.100.4  Bcast:10.20.100.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:4096  Metric:1
There also exists a /sys/class/net/ib0/ folder. I am able to run the
ibverbs NAMD version I compiled on multiple nodes without the GPUs with
great speed gains. There are no "ib_ping", "ibhosts" or "ibnodes" commands
on the login server as well as on the nodes. I have also written a
runscript.sh script for bash but yields the same error.
How do you find out if IPoIB is properly set up? If you can't work out what
may be wrong please give me some instructions about what to ask from the
cluster administrators.
many thanks,
Thomas
On 27 November 2012 16:57, Norman Geist <norman.geist_at_uni-greifswald.de>wrote:
>  Hi,****
>
> ** **
>
> seems to be a different problem. Try /sbin/ifconfig.****
>
> ** **
>
> If there is really no ifconfig, check if the folder /sys/class/net/ib0/
> exist. This will also show if you already have IPoIB installed and loaded.
> ****
>
> ** **
>
> The other thing is, that your nodes seem to not be able to reach each
> other over the infiniband with ibverbs. Does other people use the
> infiniband successfully? Usually there are some commands starting with ib,
> like ib_ping for example. You should use them to check the connectivity of
> the nodes.  Also the output of ibhosts and ibnodes are interesting to check
> the connectivity and setup.****
>
> ** **
>
> The runscript option itself is ok, but will you job also start in a csh
> environment? (Don’t know right now if this makes a difference)****
>
> ** **
>
> Let me know.****
>
> ** **
>
> Norman Geist.****
>
> ** **
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Thomas Evangelidis
>
> *Gesendet:* Dienstag, 27. November 2012 10:20
> *An:* Norman Geist
> *Cc:* Namd Mailing List
> *Betreff:* Re: namd-l: how to run NAMD-CUDA on multiple nodes****
>
>  ** **
>
> Hi Norman,
>
> I've read all posts about it on the mailing list. There is no ifconfig
> command on the cluster nodes to see if there is IPoIB installed. I remember
> I did some other checks but didn't find IPoIB on the cluster. I don't use a
> precompiled binary, I compiled charm myself using net-linux-x86_64-****
>
> ibverbs-ifort-smp-icc flags. Can I compile IPoIB driver on my own? Would I
> be able then to run NAMD-CUDA on multiple nodes?****
>
> ** **
>
> thanks,****
>
> Thomas****
>
> ** **
>
> On 27 November 2012 10:09, Norman Geist <norman.geist_at_uni-greifswald.de>
> wrote:****
>
> Hi Thomas,****
>
>  ****
>
> this problem has been posted a lot. The error you see is due the
> incompatibility of the precompiled ibverbs stuff vs. your ib installation.
> There are two possibilities to solve this:****
>
>  ****
>
> 1.       Use a non ibverbs binary with IPoIB****
>
> 2.       Compile namd with ibverbs on your own.****
>
>  ****
>
> Good luck****
>
>  ****
>
> Norman Geist.****
>
>  ****
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Thomas Evangelidis
> *Gesendet:* Montag, 26. November 2012 14:57
> *An:* namd-l
> *Cc:* Norman Geist
> *Betreff:* namd-l: how to run NAMD-CUDA on multiple nodes****
>
>  ****
>
>
> Greetings,
>
> Although I can run the ibverbs binary with CUDA on a single node, on
> multiple nodes I get:
>
> Charmrun> error 0 attaching to node:
> Timeout waiting for node-program to connect
> Charmrun> IBVERBS version of charmrun
>
> I use this command line in my pbs script for ibverbs binary with CUDA:
>
> $NAMD_BIN/charmrun ++runscript ./runscript.csh ++verbose ++remote-shell
> ssh ++nodelist $nodefile +p24 $NAMD_BIN/namd2 +setcpuaffinity +idlepoll
> prod.amber.GB.aMD.namd
>
> runscript.csh contents are:
>
> #!/bin/csh
> CHARM_ARCH="net-linux-x86_64-ibverbs-ifort-smp-icc"
>
> NAMD_BIN="/gpfs/home/lspro220u2/Opt/NAMD_CVS-2012-09-22_Source/charm++_$CHARM_ARCH/Linux-x86_64-icc"
> setenv LD_LIBRARY_PATH "$NAMD_BIN:$LD_LIBRARY_PATH"
> $*
>
> Is this the way to run NAMD-ibverbs-cuda on multiple nodes? If not could
> you please give me the right command line?
>
> thanks,
> Thomas****
>
>
>
>
> -- ****
>
> ======================================================================****
>
> Thomas Evangelidis****
>
> PhD student****
>
> University of Athens
> Faculty of Pharmacy
> Department of Pharmaceutical Chemistry
> Panepistimioupoli-Zografou
> 157 71 Athens
> GREECE****
>
> email: tevang_at_pharm.uoa.gr****
>
>           tevang3_at_gmail.com****
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/****
>
> ** **
>
> ** **
>
-- 
======================================================================
Thomas Evangelidis
PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
Panepistimioupoli-Zografou
157 71 Athens
GREECE
email: tevang_at_pharm.uoa.gr
          tevang3_at_gmail.com
website: https://sites.google.com/site/thomasevangelidishomepage/
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:18 CST