Re: NAMD 2.10b1 CUDA PME offload

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Tue Aug 26 2014 - 10:22:48 CDT

PME offload is only enabled by default for PMEInterpOrder > 4. The higher
interpolation orders are only useful for increasing the grid spacing,
which is only needed to reduce network traffic on 100M-atom simulations.

You can set "PMEOffload yes" to force offloading. I've only observed a
performance benefit from PME offloading with PMEInterpOrder 4 on the
little ARM boards. For everything else the CPU seems to be able to keep
up (assuming you're using +p<n> to actually use all of the CPU cores).

You will see "PME RECIPROCAL SUM OFFLOADED TO GPU" if it is enabled.

On Fermi (sm_20) devices NAMD 2.10b1 may be slightly slower than NAMD 2.9.
This might be due to the CUDA 6.0 vs 4.0 compiler, but all tuning efforts
at this point are targetting the Kepler and Maxwell architectures.

I can assure you that on Fermi PME offloading will be slower, since it
makes heavy use of the improved atomic instructions on Kepler.

Jim

On Tue, 26 Aug 2014, Norman Geist wrote:

> Hi experts,
>
>
>
> do I need something special to enable PME offload or is it always used? Am I
> supposed to see any speedup from namd 2.9 on Tesla C2050?
>
>
>
> Thanks
>
>
>
> Norman Geist
>
>
>
> ---
> Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv.
> http://www.avast.com
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:46 CST