Re: NAMD3: how to get the best performance in one gpu node?

From: vermaasj (vermaasj_at_msu.edu)
Date: Wed Dec 09 2020 - 10:18:41 CST

Hi Siyoung,

Just curious, but if you look at the log for the +p4 case, are you indeed seeing all 4 GPUs being activated? The only other part I’d try is to explicitly specify that you want to use all 4 GPUs, with +devices 0,1,2,3. This assumes that all 4 GPUs have been added to the run environment. In slurm, that would be --gres=gpu:4 or –gpus=4 added to the sbatch arguments.

-Josh

From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu>
Date: Wednesday, December 9, 2020 at 2:12 AM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: namd-l: NAMD3: how to get the best performance in one gpu node?
Hello,

I'd like to know how I can get the maximum performance of NAMD3.0 on the single gpu compute node. I downloaded NAMD_3.0alpha7_Linux-x86_64-multicore-CUDA-MultiGPU-SingleNode.tar.gz from the website. As indicated on the NAMD3.0 website, I added the following keywords to my configuration file.
- CUDASOAintegrate on
- margin 4

I'd like to run MD simulations on one gpu cluster that contains 2 Intel Skylake 6148 processors (40 cores per node) and 4 Nvidia V100 GPUs (If anyone is interested in details, here's the link: https://gm4.rcc.uchicago.edu/technical-specification/$>).

The command with the best performance was simply namd3 xxx.inp in this cluster. Adding +idlepoll or +setcpuaffinity didn't reduce or increase the simulation performance significantly. The commands I tested and their benchmark are as follows:

namd3 step7.inp
Info: Benchmark time: 1 CPUs 0.00564381 s/step 30.6176 ns/day 0 MB memory

namd3 +idlepoll step7.inp
Info: Benchmark time: 1 CPUs 0.00570063 s/step 30.3125 ns/day 0 MB memory

charmrun namd3 step7.inp
Info: Benchmark time: 1 CPUs 0.00534924 s/step 32.3037 ns/day 0 MB memory

charmrun namd3 +p4 step7.inp
Info: Benchmark time: 4 CPUs 0.0179005 s/step 9.65338 ns/day 0 MB memory

charmrun namd3 +ppn 4 step7.inp
Info: Benchmark time: 4 CPUs 0.019252 s/step 8.97571 ns/day 0 MB memory

namd3 +p20 step7.inp
FATAL ERROR: Multiple GPU simulations require exactly one single core per GPU (+p equal to number of devices specified with +device)

Am I using all the gpus with the above command (namd3 xxx.inp) and closely getting the best performance? I was not able to conclude this on my own after reading the new 2020 NAMD paper and the website. I'd appreciate it if you could give me advice.

Thank you.
Best,
Siyoung

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST