AW: NAMD PERFORMANCE ON NVIDIA K20 GPU

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Sep 16 2013 - 02:05:34 CDT

Did you notice the bad scaling across nodes? I guess you only use a gigabit
ethernet ,right? Also, what you call the biggest advantage ratio 8:1, has in
fact the lower speedup. The improvement in time comes due the additional
processor power, not the gpu, so best test case for measuring the benefit of
using gpus against cpu only, is the 1:1 ratio and 5.7 is quite nice and also
the rest looks reasonable. Did you use the +devices or +ignoresharing flag?
What settings did you use for fullelectfrequency?

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Neeraj Agrawal
Gesendet: Sonntag, 15. September 2013 01:59
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: NAMD PERFORMANCE ON NVIDIA K20 GPU

 

Hello,

 

I recently performed few benchmark NAMD runs on a workstation (Dual 8-core
Xeon E5-2687W, 3.1 GHz with one Nvidia Tesla K20C GPU). Below are the
results:

 

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------

 

System Size: 85,000 atoms

 

number of CPU only CPU + K20c Speed-up

processors (days/ns) (days/ns) from GPU

 

4 1.19 0.21 5.7

8 0.62 0.18 3.4

16 0.33 0.21 1.6

32 0.29 0.23 1.3

 

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------

System Size: 6300 atoms

 

number of CPU only CPU + K20c Speed-up

processors (days/ns) (days/ns) from GPU

 

4 0.086 0.087 1.0

8 0.05 0.02 2.5

16 0.029 0.02 1.5

32 0.032 0.017 1.9

 

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------

 

In all these simulations, outputEnergy is written every 100th frame and
cutoff is set to 12 A. The results of CPU only NAMd were obtained by using
Linux-x86_64 (version 2.9) and results of CPU+GPU were obtained by using
Linux-x86_64-multicore-CUDA (version 2.9)

 

Since, in the future, I will be simulating solvated proteins with around
50K-70K atoms (in total), would it be reasonable to conclude the following
based on the above benchmark results:

 

1. The biggest advantage of GPU is seen when one GPU is used per 8 cores.

 

2. It might be advantageous to add one more GPU to this workstation so that
I can run two NAMD simulations (each on 8 procs + 1 GPU) simultaneously ?

 

3. For a system with <80k atoms, hyper-threading can deteriorate the
performance.

 

Thank you,

 

Neeraj

 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:44 CST