AW: namd scale-up

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Sep 06 2013 - 02:09:19 CDT

Hi again,

 

what I saw from you output of "/proc/cpuinfo", all the 16 cores on the
machine are real physical cores, so no need to worry about scaling issues
regarding virtual cores here. So far, so good. Now you need to do benchmarks
for one node up to 8 or more nodes. This means simply run the same
simulation on various numbers of nodes for only some steps and note down the
reported "Benchmark Time". Afterwards post them here and we can tell you, if
your scaling is efficient or not, and therefore if there is more to get out
of it.

 

Norman Geist.

 

Von: Revthi Sanker [mailto:revthi.sanker1990_at_gmail.com]
Gesendet: Freitag, 6. September 2013 08:26
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: namd scale-up

 

Dear Sir,

I am herewith attaching the details which I obtained by longing into one of
the nodes in my cluster.

I would also like to bring to your notice that when the namd run has
finished the test.err file displays:

--------------------------------------------------------------------------

WARNING: It appears that your OpenFabrics subsystem is configured to only

allow registering part of your physical memory. This can cause MPI jobs to

run with erratic performance, hang, and/or crash.

 

This may be caused by your OpenFabrics vendor limiting the amount of

physical memory that can be registered. You should investigate the

relevant Linux kernel module parameters that control how much physical

memory can be registered, and increase them to allow registering all

physical memory on your machine.

 

See this Open MPI FAQ item for more information on these Linux kernel module

parameters:

 

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

 

  Local host: a3n83

  Registerable memory: 32768 MiB

  Total memory: 65511 MiB

 

Your MPI job will continue, but may be behave poorly and/or hang.

--------------------------------------------------------------------------

[a3n83:20048] 127 more processes have sent help message
help-mpi-btl-openib.txt

/ reg mem limit low

[a3n83:20048] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help

/ error messages

 

I am a beginner to simulations and I am unable to interpret the err message.
thought this could be relevant.

 

Thank you so much for your time.

 

 
<https://app.yesware.com/t/5a4f69184b5b7a28b1f640f8498f742f8c9b94af/86b77ff3
fac2abc82bce5fad31b7d605/spacer.gif>
<http://app.yesware.com/t/5a4f69184b5b7a28b1f640f8498f742f8c9b94af/86b77ff3f
ac2abc82bce5fad31b7d605/spacer.gif>

P

FA: /proc/cpuinfo

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:40 CST