Re: namd scale-up

From: Kenno Vanommeslaeghe (kvanomme_at_rx.umaryland.edu)
Date: Tue Sep 17 2013 - 18:50:07 CDT

- I find these difficult to interpret without 1-node and 2-node results in
the table. Having a 1-node result as a baseline is very important.
- "3,00,000 atoms" looks like it might be a typo. Is that 3 000 000 or 300
000 ?
- Sorry if you said it already, but what kind of interconnect do you have?

On 09/14/2013 01:26 AM, Revthi Sanker wrote:
>
> D
> ear Sir,
> This is the benchmark details that you had requested for:
>
> # of nodes Real Time taken for 2ns
> -------------------------------------------------
> 4 15hrs
> 5 13 hrs
> 6 11 hrs
> 7 9hrs 33 mins
> 8 9 hrs 5 mins
> 9 8hrs 49 mins
> 16 7 hrs 23 mins
>
> At the maximum, I can get 6 ns/day if I use all the nodes and all
> processors ( our clusters limit is 16 nodes* 16 processors=256). Is that
> the maximum possible for the system size of 3,00,000 atoms or can it be
> improved?
>
> Thank you so much for your time in advance.
>
>
> Revathi.S
> M.S. Research Scholar
> Indian Institute Of Technology, Madras
> India
> _________________________________
>
>
> On Fri, Sep 6, 2013 at 12:39 PM, Norman Geist
> <norman.geist_at_uni-greifswald.de <mailto:norman.geist_at_uni-greifswald.de>>
> wrote:
>
> Hi again,____
>
> __ __
>
> what I saw from you output of “/proc/cpuinfo”, all the 16 cores on the
> machine are real physical cores, so no need to worry about scaling
> issues regarding virtual cores here. So far, so good. Now you need to
> do benchmarks for one node up to 8 or more nodes. This means simply
> run the same simulation on various numbers of nodes for only some
> steps and note down the reported “Benchmark Time”. Afterwards post
> them here and we can tell you, if your scaling is efficient or not,
> and therefore if there is more to get out of it.____
>
> __ __
>
> Norman Geist.____
>
> __ __
>
> *Von:*Revthi Sanker [mailto:revthi.sanker1990_at_gmail.com
> <mailto:revthi.sanker1990_at_gmail.com>]
>
> *Gesendet:* Freitag, 6. September 2013 08:26
> *An:* Norman Geist
> *Cc:* Namd Mailing List
> *Betreff:* Re: namd-l: namd scale-up____
>
> __ __
>
> Dear Sir,____
>
> I am herewith attaching the details which I obtained by longing into
> one of the nodes in my cluster.____
>
> I would also like to bring to your notice that when the namd run has
> finished the *test.err* file displays:
>
> ____
>
> --------------------------------------------------------------------------____
>
> WARNING: It appears that your OpenFabrics subsystem is configured to
> only____
>
> allow registering part of your physical memory. This can cause MPI
> jobs to____
>
> run with erratic performance, hang, and/or crash.____
>
> __ __
>
> This may be caused by your OpenFabrics vendor limiting the amount of____
>
> physical memory that can be registered. You should investigate the____
>
> relevant Linux kernel module parameters that control how much physical____
>
> memory can be registered, and increase them to allow registering all____
>
> physical memory on your machine.____
>
> __ __
>
> See this Open MPI FAQ item for more information on these Linux kernel
> module____
>
> parameters:____
>
> __ __
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages____
>
> __ __
>
> Local host: a3n83____
>
> Registerable memory: 32768 MiB____
>
> Total memory: 65511 MiB____
>
> __ __
>
> Your MPI job will continue, but may be behave poorly and/or hang.____
>
> --------------------------------------------------------------------------____
>
> [a3n83:20048] 127 more processes have sent help message
> help-mpi-btl-openib.txt____
>
> / reg mem limit low____
>
> [a3n83:20048] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help____
>
> / error messages____
>
> __ __
>
> I am a beginner to simulations and I am unable to interpret the err
> message. thought this could be relevant.____
>
> __ __
>
> Thank you so much for your time. ____
>
> __ __
>
> ____
>
> *P*____
>
> *FA: /proc/cpuinfo*____
>
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:41 CST