From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Fri Apr 27 2012 - 14:10:20 CDT
The 2.9 CUDA version is optimized for smp/multicore builds and in general
the GPU runs more efficiently with a single context. I think the effect
you are seeing is due to a fortuitous staggering of processors that
improves overlap, particularly for constant volume simulations. In any
case, I would suggest trying an smp binary (use +p24 ++ppn 3) and you can
always recover an approximation of the old behavior with +devices.
On Sat, 7 Apr 2012, Thomas Albers wrote:
>> We have a cluster consisting of 8 AMD Phenom II x4 computers with GTX
>> 460 video card linked with SDR Infiniband,
> Some timing results, all with the F1ATPase benchmark:
>> NAMD 2.9b2, compiled w/ gcc 4.5.3, 32 cores: 0.065 s/step
>> NAMD 2.8 Linux-x86_64-ibverbs-CUDA, 32 cores: 0.039 s/step
>> be affected. It's only the CUDA version of NAMD 2.9 that shows this
>> odd scaling behavior. What is going on?
> What went on is that between NAMD 2.8 and 2.9 the method of assigning
> threads to GPUs has changed.
> NAMD 2.9b3-ibverbs-CUDA, 32 cores, invoked with +devices 0,0,0,0: 0.039 s/step
> NAMD 2.9b3-ibverbs-CUDA, 32 cores, invoked with +devices 0,0: 0.048 s/step
> NAMD 2.9b3-ibverbs-CUDA, 32 cores, invoked with +devices 0: 0.065 s/step
> I would be interested to hear from the developers what the reason for
> this change of default behaviour is, on what kind of hardware does it
> improve performance.
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:28 CST