Re: slow down when running 2 simulations on 1 node

From: Henrik Schopmans (h.schopmans_at_gmail.com)
Date: Thu Dec 12 2019 - 12:58:26 CST

Hey Gerald,

I actually see similar behaviour on our cluster. I always thought it is
because if you are running a second job on the same node,
the clock speed might be throttled a little bit due to temperatures /
cooling. That doesn't explain why it doesn't happen when amber
is running, though. Is amber actually using all the remaining CPU's?

Have a good time,
Henrik

Am Do., 12. Dez. 2019 um 09:12 Uhr schrieb Gerald Keller <
gerald.keller_at_uni-wuerzburg.de>:

> Hi everyone,
>
> in our working group we compute on our own GPU nodes, with no queue system
> and do not compute on multiple nodes.
> When we calculate two replicas of plain MD runs on 1 node with in total 2
> GPUs and 40 CPUs we recognized that the simulation speed slows down when
> starting the second replica.
>
> 1x NAMD on 1 node using 1 GPU and 18 CPUs:
>
> Info: Benchmark time: 18 CPUs 0.00742875 s/step
> Info: Benchmark time: 18 CPUs 0.0073947 s/step
> Info: Benchmark time: 18 CPUs 0.00747593 s/step
> Info: Benchmark time: 18 CPUs 0.00752931 s/step
> Info: Benchmark time: 18 CPUs 0.00744549 s/step
> Info: Benchmark time: 18 CPUs 0.00746218 s/step
>
> TIMING: 500 CPU: 3.86542, 0.0073741/step Wall: 3.90971, 0.0074047/step
> TIMING: 980 CPU: 7.43293, 0.00730715/step Wall: 7.49914, 0.00738945/step
> TIMING: 1000 CPU: 7.58503, 0.007605/step Wall: 7.65193, 0.0076393/step
> TIMING: 1500 CPU: 11.2973, 0.0073617/step Wall: 11.3969, 0.00763561/step
> TIMING: 2000 CPU: 15.0195, 0.00745355/step Wall: 15.1411, 0.0075375/step
>
>
> 2x NAMD on 1 node 1 GPU and 18 CPUs for each replica:
>
> Info: Benchmark time: 18 CPUs 0.0115988 s/step
> Info: Benchmark time: 18 CPUs 0.0116316 s/step
> Info: Benchmark time: 18 CPUs 0.0118586 s/step
> Info: Benchmark time: 18 CPUs 0.0115375 s/step
> Info: Benchmark time: 18 CPUs 0.0114114 s/step
> Info: Benchmark time: 18 CPUs 0.0117798 s/step
>
> TIMING: 500 CPU: 6.0915, 0.0113823/step Wall: 6.18421, 0.0114815/step
> TIMING: 1000 CPU: 11.8594, 0.0126053/step Wall: 12.0109, 0.0127244/step
> TIMING: 1500 CPU: 17.564, 0.0114935/step Wall: 17.7579, 0.0116048/step
> TIMING: 2000 CPU: 23.3157, 0.0119276/step Wall: 23.5628, 0.0119936/step
>
> If we run 1x NAMD on 1 node using 1 GPU and 18 CPUs and start another
> simulation with amber on the other GPU, there is no influence on the namd
> simulation speed.
>
> Does anyone have an idea why this is happening and how to solve that
> problem? Because of limited resources, somtimes we have to run only one
> simulation per GPU.
>
> Thank you in advance for your suggestions!
>
> Best regards
> Gerald
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:21:04 CST