Re: Is there any point in running NAMD over an ethernet-linked cluster?

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Mon Sep 22 2014 - 01:22:11 CDT

Doug,

Bandwidth is not that important for good strong scaling, latency is.

Namd get decent scaling over tcp/ip for as long as it can hide the
communication latency behind computational work.

Faster cpu cores reduce how much time can be hidden, running multiple
independent instances (vs one multi-threaded instance) increases latency.

You have to do very careful scaling tests, also leaving cores idle, if
needed, to determine what is the best setup and the maximum performance
available for a given chunk of hardware and a test system.

Axel

On Sep 22, 2014 1:24 AM, "Douglas Houston" <DouglasR.Houston_at_ed.ac.uk>
wrote:
>
> Hi Nicholas,
>
> Thank you very much for that, it is supremely helpful. I am going to try
to replicate the various benchmarking tests you describe in your link. To
that end, I wonder if you would be able to supply me with the 60,000-atom
ionized.psf and heat_out.coor files you used so that my steps match yours
as closely as possible?
>
> If not (you hint that your tests were done a long time ago), I wonder if
you ever looked into total bandwidth usage in your tests? 64 minutes of
simulation runtime across 2 of my nodes results in a total of 390GiB of
data transferred between them (according to ifconfig) - this equates to
about 100 MiB/sec. This is for my 80,000-atom system. Does this mean that
my network bandwidth could indeed be saturating (100 MiB/sec being not far
off 1Gbit/sec)? If this is true, it is not clear to me why the data
transfer rate is so high in my case but not yours.
>
> I am starting to suspect that there is either something wrong with my
installation of NAMD (a bug perhaps), or with some aspect of my hardware
>
> cheers,
> Doug
>
>
>
> Quoting Nicholas M Glykos <glykos_at_mbg.duth.gr> on Sun, 21 Sep 2014
21:21:53 +0300 (EEST):
>
>>
>>
>>> Can anyone on this mailing list tell me if they have successfully
>>> noted a speed increase by running NAMD across an ethernet-linked
>>> cluster vs. a single node?
>>
>>
>> Sure. The following page is since the days of NAMD 2.6 using gigabit
>> ethernet, Q6660 quads and a 60,000 atom system :
>>
>> http://norma.mbg.duth.gr/index.php?id=about:benchmarks:namd60k
>>
>> We could (back in 2009) go to 8 nodes, but at that point the parallel
>> efficiency had dropped to only ~45%.
>>
>>
>>
>> --
>>
>>
>> Nicholas M. Glykos, Department of Molecular Biology
>> and Genetics, Democritus University of Thrace, University Campus,
>> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
>> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
>>
>>
>>
>
>
>
>
> _____________________________________________________
> Dr. Douglas R. Houston
> Lecturer
> Institute of Structural and Molecular Biology
> Room 3.23, Michael Swann Building
> King's Buildings
> University of Edinburgh
> Edinburgh, EH9 3JR, UK
> Tel. 0131 650 7358
> http://tinyurl.com/douglasrhouston
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:52 CST