Re: NAMD performance on a supercomputer with Intel Xeon Platinum 8160 and 100Gb Intel Omni-Path Full-Fat Tree

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Nov 30 2021 - 05:14:31 CST

Actually, if you optimize how NAMD is compiled better than the system
provided executable, your parallel efficiency will go down. Please recall
Amdahl's law: the parallel efficiency is determined by the relation of time
spent on parallel execution and serial execution.

 A better optimized executable will spend even less time computing and thus
have more parallel overhead.

To get better parallel efficiency, you have to avoid or reduce all non
parallel operations like output or use of features like Tcl scripting or
make your computations more expensive by increasing the cutoff or the
system size or make the executable slower by compiling a less optimized
version.

--
Dr. Axel Kohlmeyer akohlmey_at_gmail.com https://urldefense.com/v3/__http://goo.gl/1wk0__;!!DZ3fjg!vTfAy2yEX2CbE-RC_oXIbJCP-TYotczi7lvqSPqNSBEGEfUDyM103t2gWOCxycm24A$ 
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste, Italy
On Tue, Nov 30, 2021, 05:32 Vlad Cojocaru <vlad.cojocaru_at_mpi-muenster.mpg.de>
wrote:
> Dear all,
>
> We submitted a proposal to run some extensive atomistic simulations with
> NAMD of systems ranging between 500 K to 2M atoms on a supercomputer
> with  Intel Xeon Platinum 8160 processors and 100Gb Intel Omni-path
> Full-Fat Tree interconnection.
>
> Apparently, our project may fail the technical evaluation because during
> our tests we did not achieve a 75 % parallel efficiency between 2 to 48
> nodes (each node has 2 CPUs - 24 cores/CPU).  We have tested the NAMD
> 2.14 provided by default at the site and we do not know how this was
> built. Looking at the NAMD benchmarks available for the Frontera
> supercomputer (quite similar architecture if I understand it correctly
> but for larger systems), it seems we should definitely achieve with NAMD
> 2.15 (maybe even 2.14) much better performance and parallel efficiency
> up to 48/64 nodes on this architecture than we actually achieved in our
> tests.
>
> So, my reasoning is that probably the NAMD built by default was not
> really carefully optimized.
>
> I would appreciate if anyone who has experience with building and
> optimizing NAMD on such an architecture could recommend any
> compiler/MPI/configuration/options for building an NAMD with a better
> performance and parallel efficiency. If I have some clear ideas about
> how to optimize NAMD, maybe I could make the case for our project to not
> fail the technical evaluation.
>
> Thank you very much for any advice
>
> Best wishes
> Vlad
>
>
>
> --
> Vlad Cojocaru, PD (Habil.), Ph.D.
> -----------------------------------------------
> Project Group Leader
> Department of Cell and Developmental Biology
> Max Planck Institute for Molecular Biomedicine
> Röntgenstrasse 20, 48149 Münster, Germany
> -----------------------------------------------
> Tel: +49-251-70365-324; Fax: +49-251-70365-399
> Email: vlad.cojocaru[at]mpi-muenster.mpg.de
>
> https://urldefense.com/v3/__http://www.mpi-muenster.mpg.de/43241/cojocaru__;!!DZ3fjg!ouau8vpkIDbQ8KrgRCSrc8Ng4YRHk1w7tQfeHsxoB5VnnkEQuC3CQj5uCvq0Gx8Paw$
>
>
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:12 CST