AW: SHAKE Tolerance and Performance

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Mar 02 2012 - 04:47:02 CST

Hi Aron,

 

I use ganglia with a little bash script and gmetric with nvidia-smi to
monitor gpu and memory utilization and temperature. The gpu utilization with
ACEMD is continuous 100% while with NAMD it’s in average 80% what is caused
by moving between cpu and gpu. So yes the gpu is less utilized and so less
hot. Other thing is the memory consumption on the gpu. While ACEMD uses
multible Gbytes of the vram, because it stores everything there, NAMD mostly
uses round about 300 Mbyte only for the same system, because it mostly
stores only the data there it needs for the current step. So the bottleneck
is the bandwidth between cpu and gpu or better the need to transfer data so
often, means over clocking the vram won’t help and I wouldn’t try over
clocking the pcie bus itself. What needs to be done is to move more parts
of the computation to the gpu, the pme for example.

 

cheers

Norman Geist.

 

Von: Aron Broom [mailto:broomsday_at_gmail.com]
Gesendet: Freitag, 2. März 2012 08:33
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: SHAKE Tolerance and Performance

 

Hi Norman,

I agree completely with you on all points. I'm often forced to decide
between the extra speed of pmemd and the extra functionality of NAMD. It
seems like even with SHAKE and 2fs, 2fs, 6fs multistepping you are looking
at just over 50% of the speed. But yes, things like the very robust
collective variables module, and other numerous functions in NAMD are enough
to make one look the other way from the speed disadvantage most of the time.

I realize now that a lot of my earlier inability to get a better improvement
in SHAKE with NAMD was simply because in AMBER you actually go from 1fs to
2fs, but in NAMD if you are already using multistepping, it more or less
reduces to the jump in the electrostatic step from 4fs to 6fs, since the 1fs
to 2fs bonded jump is a minor computational change.

When I first heard of the two different methods being employed: all-gpu
versus nonbonded gpu and bonded cpu, I thought that the second method would
be superior, but I now see that was naive, as the at most 10% computational
cost of the bonded interactions are nothing compared with the time it takes
to shuttle data back and forth every step. Still, one might be able to get
improvements in NAMD with a GPU by scaling back the core clocks to reduce
heat and then scaling up the memory. I've never tested this yet, but I do
note that when running an AMBER simulation my GPU temps run about 10-15
degrees hotter than the same system with NAMD, which suggests to me that all
the cores are not being used fully with NAMD (which of course makes sense if
memory is the bottleneck). Sadly though, nVidia has gimped our ability to
control clocking in anything other than windows, and I'm not thrilled by the
idea of flashing my cards bios with different clock speeds.

Thanks for the reply,

~Aron

On Fri, Mar 2, 2012 at 2:14 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

Hi Aron,

 

I would the amber11 pmemd expect to be faster because more parts of the
computation are done on the GPU (likely all), while in NAMD only the
non-bonded interactions are computed there. So NAMD has to move data around
more often and needs to return to the cpu to do the rest. One can improve
that with doing the pme every 4fs only and set outputenergies to a higher
number, cause they need to be computed on cpu to be printed to screen. But
that harms energy conservation.

 

I have not yet tested the amber11 pmemd but acemd which is also very fast.
Both are expensive, so I’ll stay with NAMD and hope they will implement more
parts of the computation on the gpu. Also, ACEMD and pmemd are really
downgraded regarding the range of functions as pmemd can’t even fix atoms
and acemd is really new and does only contain the most needed functions.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Aron Broom
Gesendet: Donnerstag, 1. März 2012 18:29
An: Nicholas M Glykos
Cc: namd-l_at_ks.uiuc.edu
Betreff: Re: namd-l: SHAKE Tolerance and Performance

 

Hi Nicholas,

Thanks a lot for the reply. I did actually have useSettle on, but it turns
out the difference was something far less exciting, and much more
embarrassing. I had been quite careless in copy and pasting part of another
config file and had managed to comment out the PME section, which meant I
wasn't using periodic conditions or PME at all. Since I was only doing
short benchmarking simulations I never ended up looking at the actual
trajectories to catch this, and the barostat didn't complain about not
having periodic conditions (although they sure helped it run much faster).
So my whole message about SHAKE should be ignored, as checking it properly
gives me the same result that you saw, a very minor improvement.

Sadly though it also means that AMBER is back to running much faster on the
GPU with the same settings (~1.7x faster, and that's when taking advantage
of NAMD's multiple timestepping of 2fs, 2fs, 6fs, if you match AMBER's 2fs
for everything, AMBER ends up being ~3x faster, but maybe some of that comes
from differences in single vs. double precision). Oh well, quite depressing
really, but thanks again for the reply.

~Aron

On Thu, Mar 1, 2012 at 4:16 AM, Nicholas M Glykos <glykos_at_mbg.duth.gr>
wrote:

Hi Aron,

> At some point, in playing around with something for a particular system, I
> discovered that in AMBER when one uses SHAKE and a 2fs timestep, you get
> pretty close to a 100% boost in performance. In NAMD, you generally get
> something in range of 15-30%, and since I'd been doing most of my work
with
> SHAKE, that seemed to explain the difference.

I believe that most people leave the default for 'useSettle' (which is on)
and, thus, use SETTLE for waters. The implication is that for most
applications the performance difference by changing the tolerance will be
much lower than the 70% you implied (I did a quick check with a node
equipped with a GTX295 card and the difference was only ~3% faster with a
tolerance of 1.0e-5).

My twocents,
Nicholas

--
           Nicholas M. Glykos, Department of Molecular Biology
    and Genetics, Democritus University of Thrace, University Campus,
 Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620
<tel:%2B302551030620> ,
   Ext.77620, Tel (lab) +302551030615 <tel:%2B302551030615> ,
http://utopia.duth.gr/~glykos/ <http://utopia.duth.gr/%7Eglykos/> 
-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo
-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:17 CST