From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Fri Oct 27 2017 - 08:39:25 CDT
It looks like you are only selectively reading the documentation.
In there, it is written that for multi-node runs you need to tell charmrun
what these nodes are. This is your responsibility, with help from your
cluster's administrators.
Also, try the regular ibverbs before playing with SMP, which is a bite more
complicated.
Giacomo
On Fri, Oct 27, 2017 at 12:09 AM, Rik Chakraborty <
rik.chakraborty01_at_gmail.com> wrote:
> Hi Giacomo,
>
> As you mentioned before about NAMD version, I used them and the results &
> informations are following below,
>
> *Version*: Linux-x86_64-TCP
> <http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=425932&AccessCode=88057704121029794685920480744664&ArchiveID=1496>;
> *Launching Script*: charmrun +p48 ++local namd2 /...npt01.inp >
> /...npt01.out ; *WCT*: 28083.976562 s
>
> *Version*: Linux-x86_64-ibverbs-smp
> <http://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=425932&AccessCode=88057704121029794685920480744664&ArchiveID=1500>
> ; *Launching Script*: charmrun +p24 ++ppn 2 namd2 /...npt01.inp >
> /...npt01.out ; *WCT*: 33843.511719 s
>
> Again, the WCT is increasing with the increasing of CPU nodes. Can you
> help me in this matter?
>
> Thank you in advance.
>
> Rik Chakraborty
>
> On Tue, Oct 17, 2017 at 7:55 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> wrote:
>
>> Please copy the mailing list on reply.
>>
>> You are using the network-based (TCP) version, which ignores the
>> InfiniBand network, and the ++local flag, which launches all tasks on the
>> local node (hence the name of the flag).
>>
>> Please read the user's guide and notes.txt carefully, and you'll be able
>> to fix these problems.
>>
>>
>> On Tue, Oct 17, 2017 at 8:10 AM, Rik Chakraborty <
>> rik.chakraborty01_at_gmail.com> wrote:
>>
>>> Thank you, Giacomo for your suggestions.
>>>
>>> We used the following specifications,
>>>
>>> NAMD build: *NAMD 2.10 for Linux-x86_64-TCP*
>>>
>>> How we are launching (for 2 CPU nodes) the simulation:
>>> */data/namd/charmrun +p48 ++local /data/namd/namd2 /home/path/trial.inp
>>> > /home/path/trial.out *
>>>
>>> On Mon, Oct 16, 2017 at 10:46 PM, Giacomo Fiorin <
>>> giacomo.fiorin_at_gmail.com> wrote:
>>>
>>>> Can you double-check that you are actually launching tasks in all
>>>> requested nodes? The fact that the time increases slightly leads to think
>>>> that you may be oversubscribing the first node. Meaning, you are dividing
>>>> up the work among the same CPU cores, but using more tasks for each core.
>>>> Theoretically this should make no difference, but the communication
>>>> overhead will make things go a bit slower.
>>>>
>>>> What is the NAMD build and how are you launching it?
>>>>
>>>>
>>>>
>>>> On Mon, Oct 16, 2017 at 9:48 AM, Chitrak Gupta <chgupta_at_mix.wvu.edu>
>>>> wrote:
>>>>
>>>>> Hi Rik,
>>>>>
>>>>> Any specific reason why you are looking at the wall clock time and not
>>>>> the benchmark times in your log file? From what I understand, benchmark
>>>>> times are more accurate than the wall clock time.
>>>>>
>>>>>
>>>>> Chitrak.
>>>>>
>>>>> On Mon, Oct 16, 2017 at 9:18 AM, Renfro, Michael <Renfro_at_tntech.edu>
>>>>> wrote:
>>>>>
>>>>>> Two things I’ve found influenced benchmarking:
>>>>>>
>>>>>> - model size: smaller models don’t provide enough compute work before
>>>>>> needing to communicate back across cores and nodes
>>>>>> - network interconnect: on a modern Xeon system, gigabit Ethernet is
>>>>>> a bottleneck, at least on large models (possibly all models)
>>>>>>
>>>>>> I benchmarked a relatively similar system starting in July (Dell 730
>>>>>> and 6320, Infiniband, K80 GPUs in the 730 nodes). Results are at [1]. If I
>>>>>> wasn’t using a ibverbs-smp build of NAMD, and was using the regular tcp
>>>>>> version, 2 nodes gave slower run times than 1. 20k atom models topped out
>>>>>> at around 5 28-core nodes, and 3M atom models kept getting better run
>>>>>> times, even out to 34 28-core nodes.
>>>>>>
>>>>>> A 73k system certainly should show a consistent speedup across your 6
>>>>>> nodes, though. And a CUDA-enabled build showed a 3-5x speedup compared to a
>>>>>> non-CUDA run on our tests, so 1-2 of your GPU nodes could run as fast as
>>>>>> all your non-GPU nodes combined.
>>>>>>
>>>>>> So check your NAMD build features for ibverbs, and maybe verify your
>>>>>> Infiniband is working correctly — I used [2] for checking Infiniband, even
>>>>>> though I’m not using Debian on my cluster.
>>>>>>
>>>>>> [1] https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD
>>>>>> [2] https://pkg-ofed.alioth.debian.org/howto/infiniband-howto.html
>>>>>>
>>>>>> --
>>>>>> Mike Renfro / HPC Systems Administrator, Information Technology
>>>>>> Services
>>>>>> 931 372-3601 / Tennessee Tech University
>>>>>>
>>>>>> > On Oct 16, 2017, at 1:20 AM, Rik Chakraborty <
>>>>>> rik.chakraborty01_at_gmail.com> wrote:
>>>>>> >
>>>>>> > Dear NAMD experts,
>>>>>> >
>>>>>> > Recently, we have installed a new cluster and the configurations
>>>>>> are following below,
>>>>>> >
>>>>>> > 1. Master node with storage node- DELL PowerEdge R730xd Server
>>>>>> > 2. CPU only node- DELL PowerEdge R430 Server (6 nos.)
>>>>>> > 3. GPU node- DELL PowerEdge R730 Server (3 nos.)
>>>>>> > 4. 18 ports Infiniband Switch- Mellanox SX6015
>>>>>> > 5. 24 ports Gigabit Ethernet switch- D-link make
>>>>>> >
>>>>>> > We have run a NAMD job using this cluster to check *the efiiciency
>>>>>> in time with increasing number of CPU node. Each CPU node has 24 processor.
>>>>>> The details of the given system and the outcomes are listed below,
>>>>>> >
>>>>>> > 1. No. of atoms used: 73310
>>>>>> > 2. Total simulation time: 1ns
>>>>>> > 3. Time step: 2fs
>>>>>> >
>>>>>> > No. of nodes
>>>>>> >
>>>>>> > Wall Clock Time (s)
>>>>>> >
>>>>>> > 1
>>>>>> >
>>>>>> > 27568.892578
>>>>>> >
>>>>>> > 2
>>>>>> >
>>>>>> > 28083.976562
>>>>>> >
>>>>>> > 3
>>>>>> >
>>>>>> > 30725.347656
>>>>>> >
>>>>>> > 4
>>>>>> >
>>>>>> > 33117.160156
>>>>>> >
>>>>>> > 5
>>>>>> >
>>>>>> > 35750.988281
>>>>>> >
>>>>>> > 6
>>>>>> >
>>>>>> > 39922.492188
>>>>>> >
>>>>>> >
>>>>>> > As we can see, wall clock time is increased with the increase of
>>>>>> no. of CPU nodes which is not expected.
>>>>>> >
>>>>>> > So, this is my kind request to check this out and let me know about
>>>>>> the problem.
>>>>>> >
>>>>>> > Thanking you,
>>>>>> >
>>>>>> > Rik Chakraborty
>>>>>> > Junior Research Fellow (Project)
>>>>>> > Dept. of Biological Sciences
>>>>>> > Indian Institute of Science Education and Research, Kolkata
>>>>>> > Mohanpur, Dist. Nadia
>>>>>> > Pin 721246
>>>>>> > West Bengal, India
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Giacomo Fiorin
>>>> Associate Professor of Research, Temple University, Philadelphia, PA
>>>> Contractor, National Institutes of Health, Bethesda, MD
>>>> http://goo.gl/Q3TBQU
>>>> https://github.com/giacomofiorin
>>>>
>>>
>>>
>>
>>
>> --
>> Giacomo Fiorin
>> Associate Professor of Research, Temple University, Philadelphia, PA
>> Contractor, National Institutes of Health, Bethesda, MD
>> http://goo.gl/Q3TBQU
>> https://github.com/giacomofiorin
>>
>
>
-- Giacomo Fiorin Associate Professor of Research, Temple University, Philadelphia, PA Contractor, National Institutes of Health, Bethesda, MD http://goo.gl/Q3TBQU https://github.com/giacomofiorin
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:40 CST