Re: namd scale-up

From: Revthi Sanker (revthi.sanker1990_at_gmail.com)
Date: Wed Sep 18 2013 - 01:16:09 CDT

Dear Sir,
I am currently running the job and single node and 2 nodes and shall get
back to you at the earliest. The system size is 300000 atoms.
The cluster configuration is as follows:
•292 Compute Nodes
•2 Master Nodes
•4 Storage Nodes
•Total Compute Power 97 TFlops
•IBM System x iDataPlex dx360 M4 Highly Optimized Servers for HPC
•Populated with 2 X Intel E5-2670 8 C 2.6 GHz Processor
•A total of 64 GB RAM per node with 8 X 8 GB 1600 MHz DIMM connected in a
fully balanced mode.
•*Low powered Mezzanine Adapter for FDR10 Infiniband based Inter processor
communication.

*
T
hank you so much for your time.

Yours sincerely,

Revathi.S
M.S. Research Scholar
Indian Institute Of Technology, Madras
India
_________________________________

On Wed, Sep 18, 2013 at 5:20 AM, Kenno Vanommeslaeghe <
kvanomme_at_rx.umaryland.edu> wrote:

> - I find these difficult to interpret without 1-node and 2-node results in
> the table. Having a 1-node result as a baseline is very important.
> - "3,00,000 atoms" looks like it might be a typo. Is that 3 000 000 or 300
> 000 ?
> - Sorry if you said it already, but what kind of interconnect do you have?
>
>
>
> On 09/14/2013 01:26 AM, Revthi Sanker wrote:
>
>>
>> D
>> ear Sir,
>> This is the benchmark details that you had requested for:
>>
>> # of nodes Real Time taken for 2ns
>> ------------------------------**-------------------
>> 4 15hrs
>> 5 13 hrs
>> 6 11 hrs
>> 7 9hrs 33 mins
>> 8 9 hrs 5 mins
>> 9 8hrs 49 mins
>> 16 7 hrs 23 mins
>>
>> At the maximum, I can get 6 ns/day if I use all the nodes and all
>> processors ( our clusters limit is 16 nodes* 16 processors=256). Is that
>> the maximum possible for the system size of 3,00,000 atoms or can it be
>> improved?
>>
>> Thank you so much for your time in advance.
>>
>>
>> Revathi.S
>> M.S. Research Scholar
>> Indian Institute Of Technology, Madras
>> India
>> ______________________________**___
>>
>>
>> On Fri, Sep 6, 2013 at 12:39 PM, Norman Geist
>> <norman.geist_at_uni-greifswald.**de <norman.geist_at_uni-greifswald.de><mailto:
>> norman.geist_at_uni-**greifswald.de <norman.geist_at_uni-greifswald.de>>>
>> wrote:
>>
>> Hi again,____
>>
>> __ __
>>
>>
>> what I saw from you output of “/proc/cpuinfo”, all the 16 cores on the
>> machine are real physical cores, so no need to worry about scaling
>> issues regarding virtual cores here. So far, so good. Now you need to
>> do benchmarks for one node up to 8 or more nodes. This means simply
>> run the same simulation on various numbers of nodes for only some
>> steps and note down the reported “Benchmark Time”. Afterwards post
>> them here and we can tell you, if your scaling is efficient or not,
>> and therefore if there is more to get out of it.____
>>
>> __ __
>>
>> Norman Geist.____
>>
>> __ __
>>
>> *Von:*Revthi Sanker [mailto:revthi.sanker1990@**gmail.com<revthi.sanker1990_at_gmail.com>
>> <mailto:revthi.sanker1990@**gmail.com <revthi.sanker1990_at_gmail.com>>]
>>
>> *Gesendet:* Freitag, 6. September 2013 08:26
>> *An:* Norman Geist
>> *Cc:* Namd Mailing List
>> *Betreff:* Re: namd-l: namd scale-up____
>>
>> __ __
>>
>> Dear Sir,____
>>
>>
>> I am herewith attaching the details which I obtained by longing into
>> one of the nodes in my cluster.____
>>
>>
>> I would also like to bring to your notice that when the namd run has
>> finished the *test.err* file displays:
>>
>> ____
>>
>> ------------------------------**------------------------------**
>> --------------____
>>
>>
>> WARNING: It appears that your OpenFabrics subsystem is configured to
>> only____
>>
>>
>> allow registering part of your physical memory. This can cause MPI
>> jobs to____
>>
>> run with erratic performance, hang, and/or crash.____
>>
>> __ __
>>
>> This may be caused by your OpenFabrics vendor limiting the amount
>> of____
>>
>> physical memory that can be registered. You should investigate
>> the____
>>
>> relevant Linux kernel module parameters that control how much
>> physical____
>>
>> memory can be registered, and increase them to allow registering
>> all____
>>
>> physical memory on your machine.____
>>
>> __ __
>>
>>
>> See this Open MPI FAQ item for more information on these Linux kernel
>> module____
>>
>> parameters:____
>>
>> __ __
>>
>> http://www.open-mpi.org/faq/?**category=openfabrics#ib-**
>> locked-pages____<http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages____>
>>
>> __ __
>>
>> Local host: a3n83____
>>
>> Registerable memory: 32768 MiB____
>>
>> Total memory: 65511 MiB____
>>
>> __ __
>>
>> Your MPI job will continue, but may be behave poorly and/or hang.____
>>
>> ------------------------------**------------------------------**
>> --------------____
>>
>>
>> [a3n83:20048] 127 more processes have sent help message
>> help-mpi-btl-openib.txt____
>>
>> / reg mem limit low____
>>
>>
>> [a3n83:20048] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help____
>>
>> / error messages____
>>
>> __ __
>>
>>
>> I am a beginner to simulations and I am unable to interpret the err
>> message. thought this could be relevant.____
>>
>> __ __
>>
>> Thank you so much for your time. ____
>>
>> __ __
>>
>> ____
>>
>> *P*____
>>
>> *FA: /proc/cpuinfo*____
>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:41 CST