Re: SHAKE Tolerance and Performance

From: Aron Broom (broomsday_at_gmail.com)
Date: Fri Mar 02 2012 - 10:56:32 CST

Hi Norman,

Thanks for the information. I was confused about the memory frequency,
somehow I thought that would affect the transfer rate along the PCI, but
clearly that isn't the case.

~Aron

On Fri, Mar 2, 2012 at 5:47 AM, Norman Geist <norman.geist_at_uni-greifswald.de
> wrote:

> Hi Aron,****
>
> ** **
>
> I use ganglia with a little bash script and gmetric with nvidia-smi to
> monitor gpu and memory utilization and temperature. The gpu utilization
> with ACEMD is continuous 100% while with NAMD it’s in average 80% what is
> caused by moving between cpu and gpu. So yes the gpu is less utilized and
> so less hot. Other thing is the memory consumption on the gpu. While ACEMD
> uses multible Gbytes of the vram, because it stores everything there, NAMD
> mostly uses round about 300 Mbyte only for the same system, because it
> mostly stores only the data there it needs for the current step. So the
> bottleneck is the bandwidth between cpu and gpu or better the need to
> transfer data so often, means over clocking the vram won’t help and I
> wouldn’t try over clocking the pcie bus itself. What needs to be done is
> to move more parts of the computation to the gpu, the pme for example.****
>
> ****
>
> cheers****
>
> Norman Geist.****
>
> ** **
>
> *Von:* Aron Broom [mailto:broomsday_at_gmail.com]
> *Gesendet:* Freitag, 2. März 2012 08:33
> *An:* Norman Geist
> *Cc:* Namd Mailing List
> *Betreff:* Re: namd-l: SHAKE Tolerance and Performance****
>
> ** **
>
> Hi Norman,
>
> I agree completely with you on all points. I'm often forced to decide
> between the extra speed of pmemd and the extra functionality of NAMD. It
> seems like even with SHAKE and 2fs, 2fs, 6fs multistepping you are looking
> at just over 50% of the speed. But yes, things like the very robust
> collective variables module, and other numerous functions in NAMD are
> enough to make one look the other way from the speed disadvantage most of
> the time.
>
> I realize now that a lot of my earlier inability to get a better
> improvement in SHAKE with NAMD was simply because in AMBER you actually go
> from 1fs to 2fs, but in NAMD if you are already using multistepping, it
> more or less reduces to the jump in the electrostatic step from 4fs to 6fs,
> since the 1fs to 2fs bonded jump is a minor computational change.
>
> When I first heard of the two different methods being employed: all-gpu
> versus nonbonded gpu and bonded cpu, I thought that the second method would
> be superior, but I now see that was naive, as the at most 10% computational
> cost of the bonded interactions are nothing compared with the time it takes
> to shuttle data back and forth every step. Still, one might be able to get
> improvements in NAMD with a GPU by scaling back the core clocks to reduce
> heat and then scaling up the memory. I've never tested this yet, but I do
> note that when running an AMBER simulation my GPU temps run about 10-15
> degrees hotter than the same system with NAMD, which suggests to me that
> all the cores are not being used fully with NAMD (which of course makes
> sense if memory is the bottleneck). Sadly though, nVidia has gimped our
> ability to control clocking in anything other than windows, and I'm not
> thrilled by the idea of flashing my cards bios with different clock speeds.
>
> Thanks for the reply,
>
> ~Aron ****
>
> On Fri, Mar 2, 2012 at 2:14 AM, Norman Geist <
> norman.geist_at_uni-greifswald.de> wrote:****
>
> Hi Aron,****
>
> ****
>
> I would the amber11 pmemd expect to be faster because more parts of the
> computation are done on the GPU (likely all), while in NAMD only the
> non-bonded interactions are computed there. So NAMD has to move data around
> more often and needs to return to the cpu to do the rest. One can improve
> that with doing the pme every 4fs only and set outputenergies to a higher
> number, cause they need to be computed on cpu to be printed to screen. But
> that harms energy conservation.****
>
> ****
>
> I have not yet tested the amber11 pmemd but acemd which is also very fast--bcaec5171ead6977ef04ba457572--

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:21:43 CST