Re: Scaling behaviour of NAMD on hosts with GPU accelrators

From: Boonstra, S. (s.boonstra_at_rug.nl)
Date: Fri Mar 24 2017 - 06:41:19 CDT

Next message: Norman Geist: "AW: Scaling behaviour of NAMD on hosts with GPU accelerators"
Previous message: Maxime Boissonneault: "Re: AW: Scaling behaviour of NAMD on hosts with GPU accelrators"
In reply to: Norman Geist: "AW: Scaling behaviour of NAMD on hosts with GPU accelrators"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Sebastian,

If you have not found it yet, I usually keep track of GPU and memory usage
with the tool nvidia-smi.
Maybe you find this useful:

#!/bin/bash
#
# smi.sh

while true; do
        nvidia-smi
        sleep 1
        clear
done

Cheers,
Sander

On Fri, Mar 24, 2017 at 11:47 AM, Norman Geist <
norman.geist_at_uni-greifswald.de> wrote:

> I see the same behavior for the Apoa1 benchmark, also for NAMD-2.11, I
> know that for older version e.g. 2.8/2.9 a had almost linear speedup when
> increasing the number of GPUs.
>
>
>
> This behavior might be related to recent optimizations of the CUDA
> kernels, or Apoa1 is just too small?
>
>
>
> Norman Geist
>
>
>
> *Von:* Norman Geist [mailto:norman.geist_at_uni-greifswald.de]
> *Gesendet:* Freitag, 24. März 2017 11:35
> *An:* namd-l_at_ks.uiuc.edu; 'Norman Geist' <norman.geist_at_uni-greifswald.de>
>
> *Betreff:* AW: namd-l: Scaling behaviour of NAMD on hosts with GPU
> accelrators
>
>
>
> Forget what I said. NAMD seems to actually can use multiple GPUs now, with
> a single Process.
>
>
>
> I’ll do some tests and see if I can find something…
>
>
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu
> <owner-namd-l_at_ks.uiuc.edu>] *Im Auftrag von *Norman Geist
> *Gesendet:* Freitag, 24. März 2017 11:28
> *An:* namd-l_at_ks.uiuc.edu; 'Kraus, Sebastian' <sebastian.kraus_at_tu-berlin.de
> >
> *Betreff:* AW: namd-l: Scaling behaviour of NAMD on hosts with GPU
> accelrators
>
>
>
> You did not tell us something about your launching procedure. Please
> notice that NAMD cannot use multiple GPUs per process. This means you need
> to use an network enabled build of NAMD in order to start multiple
> processes (one per GPU). The remaining cores can be used by SMP threads.
>
>
>
> Usually adding more GPUs will result in a somewhat linear speedup, if the
> molecular system isn’t too small.
>
>
>
> Norman Geist
>
>
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu
> <owner-namd-l_at_ks.uiuc.edu>] *Im Auftrag von *Kraus, Sebastian
> *Gesendet:* Donnerstag, 23. März 2017 17:18
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: Scaling behaviour of NAMD on hosts with GPU accelrators
>
>
>
> Hello,
>
> I am about to benchmark NAMD on an Intel x86-64 SMP HPC box equipped with
> 20 cpu cores and a setup of four Nvidia GeForce GTX 1080 (Pascal) grahic
> controllers/accelerator cards and decided to use the provided job example
> of apoa1 as testcase. The general wall clock time for job runs of CUDA/SMP
> hybrid-parallelized namd2 binaries with 5 to 20 processors varies in a
> range of 3.5 to 8 mins.
> I just observed that the runs of CUDA/SMP hybrid-parallelized namd2
> binaries with a single GPU card show a significant wall clock time
> reduction by a factor of about 10 in comparision to wall clock times of
> runs with SMP-only parallelized namd2 binaries.
> Unfortunately, the runtime of namd2 does not scale any more while adding
> further extension cards. However, the wall clock time of NAMD runs
> increases slightly while adding more GPU devices. This eventually points to
> the fact, that an increasing amount of communication overhead is generated
> based on DevicetoHost and HosttoDevice operations while using more than one
> card.
> Then, I tested whether binding/manual mapping of threads to CPU cores
> helps, but this approach leads to a global deterioration of performance and
> runtime.
> Additionally, I profiled NAMD runs via nvprof/nvvp, but was not able to
> find any valuable/helpful information about the global usage of GPU
> resources (memory/GPU power) on each card. Only a timeline of the kernel
> runtimes can be extracted, but this information does not help with the
> question whether an acceleration card is fully or only partially loaded.
> Does anyone have a valuable hint for me? How is it about the
> implementation of load balancing in NAMD (source code)?
>
>
> Best greetings
>
>
> Sebastian Kraus
>
>
>
>
>
>
>
>
>
> Technische Universität Berlin
> Fakultät II
> Institut für Chemie
> Sekretariat C3
> Straße des 17. Juni 135
> 10623 Berlin
>
> Mitarbeiter Team IT am Institut für Chemie
> Gebäude C, Straße des 17. Juni 115, Raum C7
>
>
> Tel.: +49 30 314 22263
> Fax: +49 30 314 29309
> Email: sebastian.kraus_at_tu-berlin.de
>

Next message: Norman Geist: "AW: Scaling behaviour of NAMD on hosts with GPU accelerators"
Previous message: Maxime Boissonneault: "Re: AW: Scaling behaviour of NAMD on hosts with GPU accelrators"
In reply to: Norman Geist: "AW: Scaling behaviour of NAMD on hosts with GPU accelrators"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:11 CST