Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Benjamin Merget (
Date: Thu May 24 2012 - 04:51:06 CDT

Hi @all,

We are running a cluster with 4 24-core CPU-only nodes and recently
bought a a GPU/CPU node with 4 Tesla cards and 8 CPU cores. All machines
are running the Precise Pangolin (64-bit Server) and our queueing system
is Torque 3.0.4.

Since I wanted to make use of the GPUs, I built an MPI-CUDA version of
NAMD. My problem is, however, when I try submit a job to all resources,
it crashes with the fatal error:

CUDA error in cudaGetDeviceCount on Pe XX (nodeXX): no CUDA-capable
device is detected

And this for each process on each CPU-only node...

Is there a way to tell NAMD not to look for CUDA devices on the CPU
nodes (since there obviously are none), but instead only use the GPUs of
our CPU/GPU node, so that I could use all CPU nodes and the GPU node

I recently read about remote CUDA (rCUDA). This way, the each CPU-only
node could utilize the 4 Tesla cards on the GPU node remotely as well
through Ethernet or InfiniBand. Might this be a solution, or is there a
much simpler way and I just don't see the forest for the trees?

If rCUDA indeed is the solution for all this, are there any experiences
with rCUDA and NAMD, because I have absolutely no clue how this could be
implemented into the code.

Thanks very much in advance!

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:33 CST