NAMD CUDA 2.9 Performance drop compared to 2.8

From: Norman Geist (
Date: Fri Dec 07 2012 - 05:37:20 CST

Hello experts,


unfortunately I observe a heavy performance drop between CUDA accelerated
NAMD 2.8 and 2.9. I know the way NAMD assigns processes to GPUs has changed,
but I also see a performance drop of 10% for a run with 1 CPU + 1 GPU where
this shouldn't matter, should it? The new default behavior of only one
context per GPU shows a 50% performance drop, whereas a forced "old"
behavior through +ignoresharing or +devices 0,0,0,. still comes with a 30%
penalty. All builds including SMP,Multicore and native show the same
performance drop. The only point where I saw a 20% gain, was to run my nodes
nodes with 1 process per GPU compared to 2.8. But 2.8 was dramatically
faster with 6 CPUs per GPU. So at all, with any configuration I try, namd
2.9 cuda runs slower by at least 10% and up to 50%. I understand that maybe
the GPU is used more efficiently and so oversubscribing is harmful now, but
shouldn't it be faster then anyhow?


I would like to know:


Do I miss something here?

Is there any way to reproduce the formerly timings?

Where does this new behavior provides better performance?

Have there been made any improvements for precision?

Are there any new features that required to change the old behavior of
running the GPUs?

Does most people use the GPUs in a different configuration than I do?


Currently and sadly it's not a good choice for me to upgrade to NAMD 2.9.


Thank you very much.




PS: My nodes consist of 2 Tesla C2050 and 2 Xeon E5649 (6-core)

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:49 CST