From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Nov 27 2012 - 08:57:08 CST
Hi,
seems to be a different problem. Try /sbin/ifconfig.
If there is really no ifconfig, check if the folder /sys/class/net/ib0/
exist. This will also show if you already have IPoIB installed and loaded.
The other thing is, that your nodes seem to not be able to reach each other
over the infiniband with ibverbs. Does other people use the infiniband
successfully? Usually there are some commands starting with ib, like ib_ping
for example. You should use them to check the connectivity of the nodes.
Also the output of ibhosts and ibnodes are interesting to check the
connectivity and setup.
The runscript option itself is ok, but will you job also start in a csh
environment? (Don't know right now if this makes a difference)
Let me know.
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Evangelidis
Gesendet: Dienstag, 27. November 2012 10:20
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: how to run NAMD-CUDA on multiple nodes
Hi Norman,
I've read all posts about it on the mailing list. There is no ifconfig
command on the cluster nodes to see if there is IPoIB installed. I remember
I did some other checks but didn't find IPoIB on the cluster. I don't use a
precompiled binary, I compiled charm myself using net-linux-x86_64-
ibverbs-ifort-smp-icc flags. Can I compile IPoIB driver on my own? Would I
be able then to run NAMD-CUDA on multiple nodes?
thanks,
Thomas
On 27 November 2012 10:09, Norman Geist <norman.geist_at_uni-greifswald.de>
wrote:
Hi Thomas,
this problem has been posted a lot. The error you see is due the
incompatibility of the precompiled ibverbs stuff vs. your ib installation.
There are two possibilities to solve this:
1. Use a non ibverbs binary with IPoIB
2. Compile namd with ibverbs on your own.
Good luck
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Evangelidis
Gesendet: Montag, 26. November 2012 14:57
An: namd-l
Cc: Norman Geist
Betreff: namd-l: how to run NAMD-CUDA on multiple nodes
Greetings,
Although I can run the ibverbs binary with CUDA on a single node, on
multiple nodes I get:
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
Charmrun> IBVERBS version of charmrun
I use this command line in my pbs script for ibverbs binary with CUDA:
$NAMD_BIN/charmrun ++runscript ./runscript.csh ++verbose ++remote-shell ssh
++nodelist $nodefile +p24 $NAMD_BIN/namd2 +setcpuaffinity +idlepoll
prod.amber.GB.aMD.namd
runscript.csh contents are:
#!/bin/csh
CHARM_ARCH="net-linux-x86_64-ibverbs-ifort-smp-icc"
NAMD_BIN="/gpfs/home/lspro220u2/Opt/NAMD_CVS-2012-09-22_Source/charm++_$CHAR
M_ARCH/Linux-x86_64-icc"
setenv LD_LIBRARY_PATH "$NAMD_BIN:$LD_LIBRARY_PATH"
$*
Is this the way to run NAMD-ibverbs-cuda on multiple nodes? If not could you
please give me the right command line?
thanks,
Thomas
-- ====================================================================== Thomas Evangelidis PhD student University of Athens Faculty of Pharmacy Department of Pharmaceutical Chemistry Panepistimioupoli-Zografou 157 71 Athens GREECE email: tevang_at_pharm.uoa.gr tevang3_at_gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:18 CST