NAMD3: how to get the best performance in one gpu node?

From: Siyoung Kim (kkssy141_at_gmail.com)
Date: Wed Dec 09 2020 - 03:00:02 CST

Hello,

I'd like to know how I can get the maximum performance of NAMD3.0 on the
single gpu compute node. I downloaded
NAMD_3.0alpha7_Linux-x86_64-multicore-CUDA-MultiGPU-SingleNode.tar.gz from
the website. As indicated on the NAMD3.0 website, I added the following
keywords to my configuration file.
- CUDASOAintegrate on
- margin 4

I'd like to run MD simulations on one gpu cluster that contains 2 Intel
Skylake 6148 processors (40 cores per node) and 4 Nvidia V100 GPUs (If
anyone is interested in details, here's the link:
https://gm4.rcc.uchicago.edu/technical-specification/).

The command with the best performance was simply *namd3 xxx.inp* in this
cluster. Adding +idlepoll or +setcpuaffinity didn't reduce or increase the
simulation performance significantly. The commands I tested and their
benchmark are as follows:

*namd3 step7.inp*
Info: Benchmark time: 1 CPUs 0.00564381 s/step 30.6176 ns/day 0 MB memory

*namd3 +idlepoll step7.inp*
Info: Benchmark time: 1 CPUs 0.00570063 s/step 30.3125 ns/day 0 MB memory

*charmrun namd3 step7.inp*
Info: Benchmark time: 1 CPUs 0.00534924 s/step 32.3037 ns/day 0 MB memory

*charmrun namd3 +p4 step7.inp*
Info: Benchmark time: 4 CPUs 0.0179005 s/step 9.65338 ns/day 0 MB memory

*charmrun namd3 +ppn 4 step7.inp*
Info: Benchmark time: 4 CPUs 0.019252 s/step 8.97571 ns/day 0 MB memory

*namd3 +p20 step7.inp*
FATAL ERROR: Multiple GPU simulations require exactly one single core per
GPU (+p equal to number of devices specified with +device)

Am I using all the gpus with the above command (namd3 xxx.inp) and closely
getting the best performance? I was not able to conclude this on my own
after reading the new 2020 NAMD paper and the website. I'd appreciate it if
you could give me advice.

Thank you.
Best,
Siyoung

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST