Re: problem with runiing namd through infiniband

From: Shubhra Ghosh Dastidar (sgducd_at_gmail.com)
Date: Thu May 30 2013 - 05:15:40 CDT

HI Norman,

We have resolved the issues in IB and ibping is working now. IB looks
perfect now. As I mentioned earlier I can run infiniband-smp version using
++p 32 ++ppn 32, i.e. restricting the run within a node, but ++p 32 ++ppn
16 doesn't run (i.e. using 16 cores + 16 cores on two nodes).

I can recompile charm++ easily, but looks like namd compilation is not easy.

I used: ./config Linux-x86_64-g++ --fftw-prefix ./linux-x86_64
--without-tcl --charm-arch net-linux-x86_64-ibverbs-smp-gcc

Then make -j8

I am getting the following:

src/SimParameters.C: In member function 'void
SimParameters::print_config(ParseOptions&, ConfigList*, char*&)':
src/SimParameters.C:4851: error: 'fftw_import_wisdom_from_file' was not
declared in this scope
src/SimParameters.C:4896: error: 'fftw_complex' was not declared in this
scope
src/SimParameters.C:4896: error: 'work' was not declared in this scope
src/SimParameters.C:4896: error: expected type-specifier before
'fftw_complex'
src/SimParameters.C:4896: error: expected ';' before 'fftw_complex'
src/SimParameters.C:4897: error: 'fftw_malloc' was not declared in this
scope
src/SimParameters.C:4967: error: 'FFTW_REAL_TO_COMPLEX' was not declared in
this scope
src/SimParameters.C:4968: error: 'FFTW_ESTIMATE' was not declared in this
scope
src/SimParameters.C:4968: error: 'FFTW_MEASURE' was not declared in this
scope
src/SimParameters.C:4969: error: 'FFTW_IN_PLACE' was not declared in this
scope
src/SimParameters.C:4969: error: 'FFTW_USE_WISDOM' was not declared in this
scope
src/SimParameters.C:4969: error: 'rfftwnd_create_plan_specific' was not
declared in this scope
src/SimParameters.C:4969: error: 'rfftwnd_destroy_plan' was not declared in
this scope
src/SimParameters.C:4973: error: expected primary-expression before ')'
token
src/SimParameters.C:4974: error: 'fftw_create_plan_specific' was not
declared in this scope
src/SimParameters.C:4974: error: 'fftw_destroy_plan' was not declared in
this scope
src/SimParameters.C:4978: error: expected primary-expression before ')'
token
src/SimParameters.C:4981: error: 'FFTW_COMPLEX_TO_REAL' was not declared in
this scope
src/SimParameters.C:4983: error: expected primary-expression before ')'
token
src/SimParameters.C:4988: error: expected primary-expression before ')'
token
src/SimParameters.C:4997: error: type '<type error>' argument given to
'delete', expected pointer
src/SimParameters.C:4998: error: 'fftw_free' was not declared in this scope
src/SimParameters.C:5008: error: 'fftw_export_wisdom_to_file' was not
declared in this scope
src/SimParameters.C:5016: error: 'fftw_export_wisdom_to_string' was not
declared in this scope
src/SimParameters.C: In member function 'void
SimParameters::parse_mgrid_params(ConfigList*)':
src/SimParameters.C:5246: warning: deprecated conversion from string
constant to 'char*'
src/SimParameters.C: In member function 'void
SimParameters::print_mgrid_params()':
src/SimParameters.C:5674: warning: deprecated conversion from string
constant to 'char*'
src/SimParameters.C: In member function 'void
SimParameters::receive_SimParameters(MIStream*)':
src/SimParameters.C:5779: error: 'fftw_import_wisdom_from_string' was not
declared in this scope
make: *** [obj/SimParameters.o] Error 1
make: *** Waiting for unfinished jobs....
src/parm.C: In member function 'int parm::readparm(char*)':
src/parm.C:184: warning: deprecated conversion from string constant to
'char*'
src/parm.C:194: warning: deprecated conversion from string constant to
'char*'

On Thu, May 30, 2013 at 3:35 PM, Norman Geist <
norman.geist_at_uni-greifswald.de> wrote:

> Hi Shubra,****
>
> ** **
>
> sometime it can help to clear the error from the HCAs. Additionally, most
> tools require to start a server at one endpoint and a client on the other.
> You can find out the lids with “ibadrr –l” and ibping in client2 with
> “ibping –S” and on client2 with “-L lidofclient1”. ****
>
> ** **
>
> Norman Geist.****
>
> ** **
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Shubhra Ghosh Dastidar
>
> *Gesendet:* Mittwoch, 29. Mai 2013 14:42
> *An:* NAMD
> *Betreff:* Re: namd-l: problem with runiing namd through infiniband****
>
> ** **
>
> Hi Norman,****
>
> ** **
>
> I think we still have problem with IB configuration. Because although
> ibstat, ibhosts, ibnetdiscover etc are showing OK but ibping is not showing
> to ping LID of nodes, not even to self LID, and also ibv_rc_pingpong is
> unable to ping localhost. This is a bit confusing to me as the other
> commands are working. Since I am configuring IB for the first time I don't
> have much clue about its way out.****
>
> ** **
>
> I will appreciate if anyone can help in this matter.****
>
> ** **
>
> Regards,****
>
> Shubhra****
>
> ****
>
> ** **
>
> On Wed, May 29, 2013 at 10:52 AM, Norman Geist <
> norman.geist_at_uni-greifswald.de> wrote:****
>
> Hi Shubhra,****
>
> ****
>
> if you are sure that you ib fabric setup is fine (do other programs work,
> do the tools like ib_ping work), you are maybe using an infiniband
> stack/driver that is incompatible with the precompiled builds (not OFED?).
> You could try to build namd yourself against an separate MPI (OpenMPI
> f.i.). Or, if you have IPoIB installed (check /sbin/ifconfig for interfaces
> called ib0 or similar) you can use that interfaces instead of the “eth”
> ones. Therefore choose the corresponding ip addresses to the ib network
> interfaces. Also when using IPoIB, set /sys/class/net/ib0/mode to
> “connected” and mtu to “65520” simply will doing echo with “>” redirect as
> root. Additionally, also if you are not using a CUDA version and as long as
> you use charm++, try to add +idlepoll when calling namd to improve scaling.
> ****
>
> ****
>
> Norman Geist.****
>
> ****
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Shubhra Ghosh Dastidar
> *Gesendet:* Dienstag, 28. Mai 2013 09:15
> *An:* NAMD
> *Betreff:* namd-l: problem with runiing namd through infiniband****
>
> ****
>
> I am trying to run namd through infiniband.****
>
> ****
>
> First I tried the multicore version, which runs smoothly on 32 cores being
> restricted within a node.****
>
> ****
>
> Then I tried the TCP version (which uses ethernet), which runs across
> multiple nodes, e.g. total 32 cores (16 cores from node-1 and 16 cores from
> node-2). ****
>
> ****
>
> Then I tried the infiniband version and also infiniband-smp version both.
> If the job is restricted within the 32 cores on one node then they run
> smoothly.****
>
> But if it is asked to run across multiple nodes (i.e=****
>
>

-- 
Dr. Shubhra Ghosh Dastidar
Assistant Professor
Centre of Excellence in Bioinformatics
Bose Institute
P-1/12 C.I.T. Scheme VII-M, Kolkata 700 054, India
Phone: +91-33-23554766, Ext. 332, Fax: +91-33-2355 3886
Web: http://www.boseinst.ernet.in/bic/fac/shubhra/

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:14 CST