NAMD 2.9b2-cuda does not scale well compared to NAMD 2.8

From: Thomas Albers (talbers_at_binghamton.edu)
Date: Sat Mar 31 2012 - 18:00:06 CDT

Hello!

We have a cluster consisting of 8 AMD Phenom II x4 computers with GTX
460 video card linked with SDR Infiniband, and we find that the CUDA
version of NAMD 2.9b2 scales worse than NAMD 2.8 and runs slower. I
did compile the program myself since UIUC offers only the
Linux-x86_64-ibverbs-smp-CUDA binary for download, not the
Linux-x86_64-ibverbs-CUDA version that would be suitable for us.

Some timing results, all with the F1ATPase benchmark:
NAMD 2.9b2 Linux-x86_64-ibverbs, 32 cores: 0.160 s/step
NAMD 2.8 Linux-x86_64-ibverbs, 32 cores: 0.160 s/step
NAMD 2.9.b2, compiled w/ gcc 4.5.3, 32 cores: 0.160 s/step

(One node:)
NAMD 2.9b2 Linux-x86_64-multicore-CUDA, 4 cores: 0.238 s/step
NAMD 2.8 Linux-x86_64-ibverbs-CUDA, 4 cores: 0.231 s/step
NAMD 2.9b2, compiled w/ gcc 4.5.3, 4 cores: 0.251 s/step

NAMD 2.9b2, compiled w/ gcc 4.5.3, 4 cores: 0.251 s/step
NAMD 2.9b2, compiled w/ gcc 4.5.3, 8 cores: 0.173 s/step
NAMD 2.9b2, compiled w/ gcc 4.5.3, 16 cores: 0.104 s/step
NAMD 2.9b2, compiled w/ gcc 4.5.3, 32 cores: 0.065 s/step
NAMD 2.8 Linux-x86_64-ibverbs-CUDA, 32 cores: 0.039 s/step

What is interesting is that timing results on one node are comparable
between versions, and that the non-CUDA version also does not seem to
be affected. It's only the CUDA version of NAMD 2.9 that shows this
odd scaling behavior. What is going on?

Thomas

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:23 CST