performance scaling of CUDA accelerated NAMD over multiple nodes

From: Vlad Cojocaru (vlad.cojocaru_at_mpi-muenster.mpg.de)
Date: Fri Sep 17 2021 - 05:11:56 CDT

Dear all,

We have been doing some tests with the CUDA (11 I believe) accelerated
version of NAMD 2.14 on a remote supercomputer. On 1 node (96 threads, 4
GPUs), we see a 10 fold acceleration compared to a non-CUDA NAMD 2.14.
There is a decent scaling between 1 and 2 GPUs but from 2 to 4 GPUs
almost no scaling. The simulation (classical MD) time per day for 500K
atoms is similar to what expected (comparable to what is published on
the NAMD website).

However, for a large scale project, the supercomputer site requires
scaling up to at least 10 nodes. And we are not able to get any scaling
to more than 1 node. In fact, as soon as running on 2 nodes (with 4 GPUs
each), the performance is getting worse than on a single node.

I know that lots of details are needed to actually pinpoint the
issue(s), many of these are architecture dependent and we do not have
all these details.

However, I would still like to ask in general if any of you has
routinely managed to scale up the performance of the CUDA accelerated
NAMD 2.14 with the number of nodes. And if yes, are there any general
tips and tricks that could be tried ?

Thank you for any insights !
Vlad

-- 
Vlad Cojocaru, PD (Habil.), Ph.D.
-----------------------------------------------
Project Group Leader
Department of Cell and Developmental Biology
Max Planck Institute for Molecular Biomedicine
Röntgenstrasse 20, 48149 Münster, Germany
-----------------------------------------------
Tel: +49-251-70365-324; Fax: +49-251-70365-399
Email: vlad.cojocaru[at]mpi-muenster.mpg.de
https://urldefense.com/v3/__http://www.mpi-muenster.mpg.de/43241/cojocaru__;!!DZ3fjg!rk6Iphoo5rM4f2l2YT1vs5SwsRjzDgBZatmHwqU4VwajBWBxMQ4BM2F0_C7PXf-jew$ 

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST