AW: how to run NAMD-CUDA on multiple nodes

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Nov 27 2012 - 08:57:08 CST

Hi,

 

seems to be a different problem. Try /sbin/ifconfig.

 

If there is really no ifconfig, check if the folder /sys/class/net/ib0/
exist. This will also show if you already have IPoIB installed and loaded.

 

The other thing is, that your nodes seem to not be able to reach each other
over the infiniband with ibverbs. Does other people use the infiniband
successfully? Usually there are some commands starting with ib, like ib_ping
for example. You should use them to check the connectivity of the nodes.
Also the output of ibhosts and ibnodes are interesting to check the
connectivity and setup.

 

The runscript option itself is ok, but will you job also start in a csh
environment? (Don't know right now if this makes a difference)

 

Let me know.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Evangelidis
Gesendet: Dienstag, 27. November 2012 10:20
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: how to run NAMD-CUDA on multiple nodes

 

Hi Norman,

I've read all posts about it on the mailing list. There is no ifconfig
command on the cluster nodes to see if there is IPoIB installed. I remember
I did some other checks but didn't find IPoIB on the cluster. I don't use a
precompiled binary, I compiled charm myself using net-linux-x86_64-

ibverbs-ifort-smp-icc flags. Can I compile IPoIB driver on my own? Would I
be able then to run NAMD-CUDA on multiple nodes?

 

thanks,

Thomas

 

On 27 November 2012 10:09, Norman Geist <norman.geist_at_uni-greifswald.de>
wrote:

Hi Thomas,

 

this problem has been posted a lot. The error you see is due the
incompatibility of the precompiled ibverbs stuff vs. your ib installation.
There are two possibilities to solve this:

 

1. Use a non ibverbs binary with IPoIB

2. Compile namd with ibverbs on your own.

 

Good luck

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Evangelidis
Gesendet: Montag, 26. November 2012 14:57
An: namd-l
Cc: Norman Geist
Betreff: namd-l: how to run NAMD-CUDA on multiple nodes

 

Greetings,

Although I can run the ibverbs binary with CUDA on a single node, on
multiple nodes I get:

Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect
Charmrun> IBVERBS version of charmrun

I use this command line in my pbs script for ibverbs binary with CUDA:

$NAMD_BIN/charmrun ++runscript ./runscript.csh ++verbose ++remote-shell ssh
++nodelist $nodefile +p24 $NAMD_BIN/namd2 +setcpuaffinity +idlepoll
prod.amber.GB.aMD.namd

runscript.csh contents are:

#!/bin/csh
CHARM_ARCH="net-linux-x86_64-ibverbs-ifort-smp-icc"
NAMD_BIN="/gpfs/home/lspro220u2/Opt/NAMD_CVS-2012-09-22_Source/charm++_$CHAR
M_ARCH/Linux-x86_64-icc"
setenv LD_LIBRARY_PATH "$NAMD_BIN:$LD_LIBRARY_PATH"
$*

Is this the way to run NAMD-ibverbs-cuda on multiple nodes? If not could you
please give me the right command line?

thanks,
Thomas

-- 
======================================================================
Thomas Evangelidis
PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
Panepistimioupoli-Zografou
157 71 Athens
GREECE
email: tevang_at_pharm.uoa.gr
          tevang3_at_gmail.com
website: https://sites.google.com/site/thomasevangelidishomepage/
 
 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:47 CST