AW: problem with runiing namd through infiniband

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri May 31 2013 - 01:04:09 CDT

Could it be that there are more configure parameters for fftw? Maybe the
path to the library or that it should be used at all? Maybe try without any
special fftw parameter for now and see if it works with the ibverbs than.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Shubhra Ghosh Dastidar
Gesendet: Donnerstag, 30. Mai 2013 12:16
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: problem with runiing namd through infiniband

 

HI Norman,

 

We have resolved the issues in IB and ibping is working now. IB looks
perfect now. As I mentioned earlier I can run infiniband-smp version using
++p 32 ++ppn 32, i.e. restricting the run within a node, but ++p 32 ++ppn 16
doesn't run (i.e. using 16 cores + 16 cores on two nodes).

 

I can recompile charm++ easily, but looks like namd compilation is not easy.

I used: ./config Linux-x86_64-g++ --fftw-prefix ./linux-x86_64
--without-tcl --charm-arch net-linux-x86_64-ibverbs-smp-gcc

 

Then make -j8

 

I am getting the following:

src/SimParameters.C: In member function 'void
SimParameters::print_config(ParseOptions&, ConfigList*, char*&)':

src/SimParameters.C:4851: error: 'fftw_import_wisdom_from_file' was not
declared in this scope

src/SimParameters.C:4896: error: 'fftw_complex' was not declared in this
scope

src/SimParameters.C:4896: error: 'work' was not declared in this scope

src/SimParameters.C:4896: error: expected type-specifier before
'fftw_complex'

src/SimParameters.C:4896: error: expected ';' before 'fftw_complex'

src/SimParameters.C:4897: error: 'fftw_malloc' was not declared in this
scope

src/SimParameters.C:4967: error: 'FFTW_REAL_TO_COMPLEX' was not declared in
this scope

src/SimParameters.C:4968: error: 'FFTW_ESTIMATE' was not declared in this
scope

src/SimParameters.C:4968: error: 'FFTW_MEASURE' was not declared in this
scope

src/SimParameters.C:4969: error: 'FFTW_IN_PLACE' was not declared in this
scope

src/SimParameters.C:4969: error: 'FFTW_USE_WISDOM' was not declared in this
scope

src/SimParameters.C:4969: error: 'rfftwnd_create_plan_specific' was not
declared in this scope

src/SimParameters.C:4969: error: 'rfftwnd_destroy_plan' was not declared in
this scope

src/SimParameters.C:4973: error: expected primary-expression before ')'
token

src/SimParameters.C:4974: error: 'fftw_create_plan_specific' was not
declared in this scope

src/SimParameters.C:4974: error: 'fftw_destroy_plan' was not declared in
this scope

src/SimParameters.C:4978: error: expected primary-expression before ')'
token

src/SimParameters.C:4981: error: 'FFTW_COMPLEX_TO_REAL' was not declared in
this scope

src/SimParameters.C:4983: error: expected primary-expression before ')'
token

src/SimParameters.C:4988: error: expected primary-expression before ')'
token

src/SimParameters.C:4997: error: type '<type error>' argument given to
'delete', expected pointer

src/SimParameters.C:4998: error: 'fftw_free' was not declared in this scope

src/SimParameters.C:5008: error: 'fftw_export_wisdom_to_file' was not
declared in this scope

src/SimParameters.C:5016: error: 'fftw_export_wisdom_to_string' was not
declared in this scope

src/SimParameters.C: In member function 'void
SimParameters::parse_mgrid_params(ConfigList*)':

src/SimParameters.C:5246: warning: deprecated conversion from string
constant to 'char*'

src/SimParameters.C: In member function 'void
SimParameters::print_mgrid_params()':

src/SimParameters.C:5674: warning: deprecated conversion from string
constant to 'char*'

src/SimParameters.C: In member function 'void
SimParameters::receive_SimParameters(MIStream*)':

src/SimParameters.C:5779: error: 'fftw_import_wisdom_from_string' was not
declared in this scope

make: *** [obj/SimParameters.o] Error 1

make: *** Waiting for unfinished jobs....

src/parm.C: In member function 'int parm::readparm(char*)':

src/parm.C:184: warning: deprecated conversion from string constant to
'char*'

src/parm.C:194: warning: deprecated conversion from string constant to
'char*'

 

 

On Thu, May 30, 2013 at 3:35 PM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

Hi Shubra,

 

sometime it can help to clear the error from the HCAs. Additionally, most
tools require to start a server at one endpoint and a client on the other.
You can find out the lids with "ibadrr -l" and ibping in client2 with
"ibping -S" and on client2 with "-L lidofclient1".

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Shubhra Ghosh Dastidar

Gesendet: Mittwoch, 29. Mai 2013 14:42
An: NAMD

Betreff: Re: namd-l: problem with runiing namd through infiniband

 

Hi Norman,

 

I think we still have problem with IB configuration. Because although
ibstat, ibhosts, ibnetdiscover etc are showing OK but ibping is not showing
to ping LID of nodes, not even to self LID, and also ibv_rc_pingpong is
unable to ping localhost. This is a bit confusing to me as the other
commands are working. Since I am configuring IB for the first time I don't
have much clue about its way out.

 

I will appreciate if anyone can help in this matter.

 

Regards,

Shubhra

   

 

On Wed, May 29, 2013 at 10:52 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

Hi Shubhra,

 

if you are sure that you ib fabric setup is fine (do other programs work, do
the tools like ib_ping work), you are maybe using an infiniband stack/driver
that is incompatible with the precompiled builds (not OFED?). You could try
to build namd yourself against an separate MPI (OpenMPI f.i.). Or, if you
have IPoIB installed (check /sbin/ifconfig for interfaces called ib0 or
similar) you can use that interfaces instead of the "eth" ones. Therefore
choose the corresponding ip addresses to the ib network interfaces. Also
when using IPoIB, set /sys/class/net/ib0/mode to "connected" and mtu to
"65520" simply will doing echo with ">" redirect as root. Additionally, also
if you are not using a CUDA version and as long as you use charm++, try to
add +idlepoll when calling namd to improve scaling.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Shubhra Ghosh Dastidar
Gesendet: Dienstag, 28. Mai 2013 09:15
An: NAMD
Betreff: namd-l: problem with runiing namd through infiniband

 

I am trying to run namd through infiniband.

 

First I tried the multicore version, which runs smoothly on 32 cores being
restricted within a node.

 

Then I tried the TCP version (which uses ethernet), which runs across
multiple nodes, e.g. total 32 cores (16 cores from node-1 and 16 cores from
node-2).

 

Then I tried the infiniband version and also infiniband-smp version both.
If the job is restricted within the 32 cores on one node then they run
smoothly.

But if it is asked to run across multiple nodes (i.e=

 

-- 
Dr. Shubhra Ghosh Dastidar
Assistant Professor
Centre of Excellence in Bioinformatics
Bose Institute
P-1/12 C.I.T. Scheme VII-M, Kolkata 700 054, India
Phone: +91-33-23554766, Ext. 332, Fax: +91-33-2355 3886
Web: http://www.boseinst.ernet.in/bic/fac/shubhra/
 

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:14 CST