Re: Fwd: About wall clock time

From: Rik Chakraborty (rik.chakraborty01_at_gmail.com)
Date: Thu Oct 26 2017 - 23:09:27 CDT

Hi Giacomo,

As you mentioned before about NAMD version, I used them and the results &
informations are following below,

*Version*: Linux-x86_64-TCP
<http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=425932&AccessCode=88057704121029794685920480744664&ArchiveID=1496>;
*Launching Script*: charmrun +p48 ++local namd2 /...npt01.inp >
/...npt01.out ; *WCT*: 28083.976562 s

*Version*: Linux-x86_64-ibverbs-smp
<http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=425932&AccessCode=88057704121029794685920480744664&ArchiveID=1500>
; *Launching Script*: charmrun +p24 ++ppn 2 namd2 /...npt01.inp >
/...npt01.out ; *WCT*: 33843.511719 s

Again, the WCT is increasing with the increasing of CPU nodes. Can you help
me in this matter?

Thank you in advance.

Rik Chakraborty

On Tue, Oct 17, 2017 at 7:55 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
wrote:

> Please copy the mailing list on reply.
>
> You are using the network-based (TCP) version, which ignores the
> InfiniBand network, and the ++local flag, which launches all tasks on the
> local node (hence the name of the flag).
>
> Please read the user's guide and notes.txt carefully, and you'll be able
> to fix these problems.
>
>
> On Tue, Oct 17, 2017 at 8:10 AM, Rik Chakraborty <
> rik.chakraborty01_at_gmail.com> wrote:
>
>> Thank you, Giacomo for your suggestions.
>>
>> We used the following specifications,
>>
>> NAMD build: *NAMD 2.10 for Linux-x86_64-TCP*
>>
>> How we are launching (for 2 CPU nodes) the simulation:
>> */data/namd/charmrun +p48 ++local /data/namd/namd2 /home/path/trial.inp >
>> /home/path/trial.out *
>>
>> On Mon, Oct 16, 2017 at 10:46 PM, Giacomo Fiorin <
>> giacomo.fiorin_at_gmail.com> wrote:
>>
>>> Can you double-check that you are actually launching tasks in all
>>> requested nodes? The fact that the time increases slightly leads to think
>>> that you may be oversubscribing the first node. Meaning, you are dividing
>>> up the work among the same CPU cores, but using more tasks for each core.
>>> Theoretically this should make no difference, but the communication
>>> overhead will make things go a bit slower.
>>>
>>> What is the NAMD build and how are you launching it?
>>>
>>>
>>>
>>> On Mon, Oct 16, 2017 at 9:48 AM, Chitrak Gupta <chgupta_at_mix.wvu.edu>
>>> wrote:
>>>
>>>> Hi Rik,
>>>>
>>>> Any specific reason why you are looking at the wall clock time and not
>>>> the benchmark times in your log file? From what I understand, benchmark
>>>> times are more accurate than the wall clock time.
>>>>
>>>>
>>>> Chitrak.
>>>>
>>>> On Mon, Oct 16, 2017 at 9:18 AM, Renfro, Michael <Renfro_at_tntech.edu>
>>>> wrote:
>>>>
>>>>> Two things I’ve found influenced benchmarking:
>>>>>
>>>>> - model size: smaller models don’t provide enough compute work before
>>>>> needing to communicate back across cores and nodes
>>>>> - network interconnect: on a modern Xeon system, gigabit Ethernet is a
>>>>> bottleneck, at least on large models (possibly all models)
>>>>>
>>>>> I benchmarked a relatively similar system starting in July (Dell 730
>>>>> and 6320, Infiniband, K80 GPUs in the 730 nodes). Results are at [1]. If I
>>>>> wasn’t using a ibverbs-smp build of NAMD, and was using the regular tcp
>>>>> version, 2 nodes gave slower run times than 1. 20k atom models topped out
>>>>> at around 5 28-core nodes, and 3M atom models kept getting better run
>>>>> times, even out to 34 28-core nodes.
>>>>>
>>>>> A 73k system certainly should show a consistent speedup across your 6
>>>>> nodes, though. And a CUDA-enabled build showed a 3-5x speedup compared to a
>>>>> non-CUDA run on our tests, so 1-2 of your GPU nodes could run as fast as
>>>>> all your non-GPU nodes combined.
>>>>>
>>>>> So check your NAMD build features for ibverbs, and maybe verify your
>>>>> Infiniband is working correctly — I used [2] for checking Infiniband, even
>>>>> though I’m not using Debian on my cluster.
>>>>>
>>>>> [1] https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD
>>>>> [2] https://pkg-ofed.alioth.debian.org/howto/infiniband-howto.html
>>>>>
>>>>> --
>>>>> Mike Renfro / HPC Systems Administrator, Information Technology
>>>>> Services
>>>>> 931 372-3601 / Tennessee Tech University
>>>>>
>>>>> > On Oct 16, 2017, at 1:20 AM, Rik Chakraborty <
>>>>> rik.chakraborty01_at_gmail.com> wrote:
>>>>> >
>>>>> > Dear NAMD experts,
>>>>> >
>>>>> > Recently, we have installed a new cluster and the configurations are
>>>>> following below,
>>>>> >
>>>>> > 1. Master node with storage node- DELL PowerEdge R730xd Server
>>>>> > 2. CPU only node- DELL PowerEdge R430 Server (6 nos.)
>>>>> > 3. GPU node- DELL PowerEdge R730 Server (3 nos.)
>>>>> > 4. 18 ports Infiniband Switch- Mellanox SX6015
>>>>> > 5. 24 ports Gigabit Ethernet switch- D-link make
>>>>> >
>>>>> > We have run a NAMD job using this cluster to check *the efiiciency
>>>>> in time with increasing number of CPU node. Each CPU node has 24 processor.
>>>>> The details of the given system and the outcomes are listed below,
>>>>> >
>>>>> > 1. No. of atoms used: 73310
>>>>> > 2. Total simulation time: 1ns
>>>>> > 3. Time step: 2fs
>>>>> >
>>>>> > No. of nodes
>>>>> >
>>>>> > Wall Clock Time (s)
>>>>> >
>>>>> > 1
>>>>> >
>>>>> > 27568.892578
>>>>> >
>>>>> > 2
>>>>> >
>>>>> > 28083.976562
>>>>> >
>>>>> > 3
>>>>> >
>>>>> > 30725.347656
>>>>> >
>>>>> > 4
>>>>> >
>>>>> > 33117.160156
>>>>> >
>>>>> > 5
>>>>> >
>>>>> > 35750.988281
>>>>> >
>>>>> > 6
>>>>> >
>>>>> > 39922.492188
>>>>> >
>>>>> >
>>>>> > As we can see, wall clock time is increased with the increase of no--001a113b1bbc6c5120055c7f7413--

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:44 CST