Re: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Axel Kohlmeyer (
Date: Fri May 25 2012 - 05:39:37 CDT

On Thu, May 24, 2012 at 5:51 AM, Benjamin Merget
<> wrote:
> Hi @all,
> We are running a cluster with 4 24-core CPU-only nodes and recently bought a
> a GPU/CPU node with 4 Tesla cards and 8 CPU cores. All machines are running
> the Precise Pangolin (64-bit Server) and our queueing system is Torque
> 3.0.4.
> Since I wanted to make use of the GPUs, I built an MPI-CUDA version of NAMD.
> My problem is, however, when I try submit a job to all resources, it crashes
> with the fatal error:
> CUDA error in cudaGetDeviceCount on Pe XX (nodeXX): no CUDA-capable device
> is detected
> And this for each process on each CPU-only node...
> Is there a way to tell NAMD not to look for CUDA devices on the CPU nodes
> (since there obviously are none), but instead only use the GPUs of our
> CPU/GPU node, so that I could use all CPU nodes and the GPU node together?

no, and it is not worth it. just run one calculation on the GPU node
and a second on the rest and enjoy efficient utilization of your hardware.
anything else is just wasting your time.


> I recently read about remote CUDA (rCUDA). This way, the each CPU-only node
> could utilize the 4 Tesla cards on the GPU node remotely as well through
> Ethernet or InfiniBand. Might this be a solution, or is there a much simpler
> way and I just don't see the forest for the trees?
> If rCUDA indeed is the solution for all this, are there any experiences with
> rCUDA and NAMD, because I have absolutely no clue how this could be
> implemented into the code.
> Thanks very much in advance!
> Benjamin

Dr. Axel Kohlmeyer
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:00 CST