Re: slow down when running 2 simulations on 1 node

From: Gerald Keller (gerald.keller_at_uni-wuerzburg.de)
Date: Thu Dec 12 2019 - 05:18:47 CST

Hi Miro,

I am using ./namd2 +ppn CPU_COUNT +idlepoll
The GPUs are selected by setting the global CUDA_VISIBLE_DEVICES variable.

>>> Miro Astore <miro.astore_at_gmail.com> 12/12/19 11:10 AM >>>
Hi Gerrard, I'm having a similar issue. Can I ask what command you are using to run namd?

Le jeu. 12 déc. 2019 à 19:12, Gerald Keller <gerald.keller_at_uni-wuerzburg.de> a écrit :

Hi everyone,

in our working group we compute on our own GPU nodes, with no queue system and do not compute on multiple nodes.
When we calculate two replicas of plain MD runs on 1 node with in total 2 GPUs and 40 CPUs we recognized that the
simulation speed slows down when starting the second replica.

1x NAMD on 1 node using 1 GPU and 18 CPUs:

Info: Benchmark time: 18 CPUs 0.00742875 s/step
Info: Benchmark time: 18 CPUs 0.0073947 s/step
Info: Benchmark time: 18 CPUs 0.00747593 s/step
Info: Benchmark time: 18 CPUs 0.00752931 s/step
Info: Benchmark time: 18 CPUs 0.00744549 s/step
Info: Benchmark time: 18 CPUs 0.00746218 s/step

TIMING: 500 CPU: 3.86542, 0.0073741/step Wall: 3.90971, 0.0074047/step
TIMING: 980 CPU: 7.43293, 0.00730715/step Wall: 7.49914, 0.00738945/step
TIMING: 1000 CPU: 7.58503, 0.007605/step Wall: 7.65193, 0.0076393/step
TIMING: 1500 CPU: 11.2973, 0.0073617/step Wall: 11.3969, 0.00763561/step
TIMING: 2000 CPU: 15.0195, 0.00745355/step Wall: 15.1411, 0.0075375/step

2x NAMD on 1 node 1 GPU and 18 CPUs for each replica:

Info: Benchmark time: 18 CPUs 0.0115988 s/step
Info: Benchmark time: 18 CPUs 0.0116316 s/step
Info: Benchmark time: 18 CPUs 0.0118586 s/step
Info: Benchmark time: 18 CPUs 0.0115375 s/step
Info: Benchmark time: 18 CPUs 0.0114114 s/step
Info: Benchmark time: 18 CPUs 0.0117798 s/step

TIMING: 500 CPU: 6.0915, 0.0113823/step Wall: 6.18421, 0.0114815/step
TIMING: 1000 CPU: 11.8594, 0.0126053/step Wall: 12.0109, 0.0127244/step
TIMING: 1500 CPU: 17.564, 0.0114935/step Wall: 17.7579, 0.0116048/step
TIMING: 2000 CPU: 23.3157, 0.0119276/step Wall: 23.5628, 0.0119936/step

If we run 1x NAMD on 1 node using 1 GPU and 18 CPUs and start another simulation with amber on the other GPU, there is
no influence on the namd simulation speed.

Does anyone have an idea why this is happening and how to solve that problem? Because of limited resources, somtimes we
have to run only one simulation per GPU.

Thank you in advance for your suggestions!

Best regards
Gerald
 
 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:21:03 CST