Re: AW: NAMD PERFORMANCE ON NVIDIA K20 GPU

From: Kenno Vanommeslaeghe (kvanomme_at_rx.umaryland.edu)
Date: Tue Sep 17 2013 - 18:42:55 CDT

- I think Neeraj never ran across nodes (he's using the multicore binary
for the GPU calculations); my interpretation is that he has
hyper-threading enabled, and that the bad scaling when going from 16 to 32
is because of the use of "virtual" CPUs. Even Intel's new hyper-threading
offers no benefit for most computational chemistry calculations; we turn
it off in the BIOS.

- The CPU-only results for the parallel calculations will probably look a
bit better when using the multicore version too; as it is, we're not
having an apples-to-apples comparison (though the differences may not be
large).

On 09/16/2013 03:05 AM, Norman Geist wrote:
> Did you notice the bad scaling across nodes? I guess you only use a
> gigabit ethernet ,right? Also, what you call the biggest advantage ratio
> 8:1, has in fact the lower speedup. The improvement in time comes due the
> additional processor power, not the gpu, so best test case for measuring
> the benefit of using gpus against cpu only, is the 1:1 ratio and 5.7 is
> quite nice and also the rest looks reasonable. Did you use the +devices or
> +ignoresharing flag? What settings did you use for fullelectfrequency?
>
> Norman Geist.
>
> *Von:*owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Neeraj Agrawal
> *Gesendet:* Sonntag, 15. September 2013 01:59
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: NAMD PERFORMANCE ON NVIDIA K20 GPU
>
> Hello,
>
> I recently performed few benchmark NAMD runs on a workstation (Dual 8-core
> Xeon E5-2687W, 3.1 GHz with one Nvidia Tesla K20C GPU). Below are the results:
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> System Size: 85,000 atoms
>
> number of CPU only CPU + K20c Speed-up
>
> processors (days/ns) (days/ns) from GPU
>
> 4 1.19 0.21 5.7
>
> 8 0.62 0.18 3.4
>
> 16 0.33 0.21 1.6
>
> 32 0.29 0.23 1.3
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> System Size: 6300 atoms
>
> number of CPU only CPU + K20c Speed-up
>
> processors (days/ns) (days/ns) from GPU
>
> 4 0.086 0.087 1.0
>
> 8 0.05 0.02 2.5
>
> 16 0.029 0.02 1.5
>
> 32 0.032 0.017 1.9
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> In all these simulations, outputEnergy is written every 100th frame and
> cutoff is set to 12 A. The results of CPU only NAMd were obtained by using
> Linux-x86_64 (version 2.9) and results of CPU+GPU were obtained by using
> Linux-x86_64-multicore-CUDA (version 2.9)
>
> Since, in the future, I will be simulating solvated proteins with around
> 50K-70K atoms (in total), would it be reasonable to conclude the following
> based on the above benchmark results:
>
> 1. The biggest advantage of GPU is seen when one GPU is used per 8 cores.
>
> 2. It might be advantageous to add one more GPU to this workstation so
> that I can run two NAMD simulations (each on 8 procs + 1 GPU) simultaneously ?
>
> 3. For a system with <80k atoms, hyper-threading can deteriorate the
> performance.
>
> Thank you,
>
> Neeraj
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:41 CST