Re: NAMD 2.7b1 Performance loss with CUDA 2.3

From: Axel Kohlmeyer (
Date: Thu Nov 12 2009 - 21:07:57 CST

On Fri, 2009-11-13 at 03:07 +0100, Joakim Swedberg wrote:
> Hello everybody,

hi joakim,


> But as I said, I've tried many different configurations as +p2 and
> +device 1,3, +p8 and +device 0,1,2,3 etc. Interestingly +p8 and
> +device 0,1,2,3 has the least performance loss (octa threaded quad
> core?) but its marginal and always around 20%.

> Have anybody had similar experiences? I would be greatful for any suggestions.

getting good performance with GPUs depends on a lot of factors.

particularly the PCIe bus performance and available
memory bandwidth matters a lot.

with the GTX 295 you have two GPUs on one PCIe slot.
they have to share the bandwidth and due to the way
NAMD uses GPUs there is data being sent back and forth
all the time.

the second issue is the PCIe bandwidth overall.
depending on your motherboard design, having a PCIe-v1
device in the wrong slot can degrade the performance
of some other PCIe slot(s). you need the full 16-lane
PCIe-v2 performance to get good throughput.

finally, are you running that machine in text mode?
if you have a GPU that is also servicing the X server
than this can affect performance, too.

based on what you describe, i would first check out
whether the main board is working well. there should
be a GPU bandwidth test in the CUDA SDK. i suggest
you try that one out.

to provide a point of reference. i get around 3 GB/s on
a tesla S1070 in a 8xPCIe-v2.0 slot on a Nehalem EP xeon
machine and 2.5GB/s on a GTX285 in a 16xPCIe-v2.0 slot
on a Woodcrest xeon machine.



