AW: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Norman Geist (
Date: Fri May 25 2012 - 01:10:17 CDT

Hi Benjamin,

so if I got the point correctly, you want to do a hybrid cpuonly and gpu
simulation. Pooh, sounds tough.
I'm sure that NAMD is not knowingly prepared for such things, but if there
is no change in the communication between cpu and gpu runs, it could work,
but you have to outsmart the charmrun a little. A Namd CUDA binary _NEEDS_ a
GPU, if there is none, it fails. So you need to specify different binaries
for CPU and GPU nodes. I think this could work with charmrun itself. You
need to prepare the nodelist file with the pathfix parameter, this could
look like:

group main
host cpunode1
host cpunode2
pathfix /mybin/namd-cpu/ /mybin/namd-gpu/
host gpunode1
host gpunode2

and a charmrun:

charmrun ++nodelist mynodelist +p48 /mybin/namd-cpu/namd2 >> my.out

Let me know if it works.

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: [] Im
> Auftrag von Benjamin Merget
> Gesendet: Donnerstag, 24. Mai 2012 11:51
> An:
> Betreff: namd-l: Simultaneous calculation on CPU-only nodes and CPU/GPU
> node (with or without rCUDA)
> Hi @all,
> We are running a cluster with 4 24-core CPU-only nodes and recently
> bought a a GPU/CPU node with 4 Tesla cards and 8 CPU cores. All
> machines
> are running the Precise Pangolin (64-bit Server) and our queueing
> system
> is Torque 3.0.4.
> Since I wanted to make use of the GPUs, I built an MPI-CUDA version of
> NAMD. My problem is, however, when I try submit a job to all resources,
> it crashes with the fatal error:
> CUDA error in cudaGetDeviceCount on Pe XX (nodeXX): no CUDA-capable
> device is detected
> And this for each process on each CPU-only node...
> Is there a way to tell NAMD not to look for CUDA devices on the CPU
> nodes (since there obviously are none), but instead only use the GPUs
> of
> our CPU/GPU node, so that I could use all CPU nodes and the GPU node
> together?
> I recently read about remote CUDA (rCUDA). This way, the each CPU-only
> node could utilize the 4 Tesla cards on the GPU node remotely as well
> through Ethernet or InfiniBand. Might this be a solution, or is there a
> much simpler way and I just don't see the forest for the trees?
> If rCUDA indeed is the solution for all this, are there any experiences
> with rCUDA and NAMD, because I have absolutely no clue how this could
> be
> implemented into the code.
> Thanks very much in advance!
> Benjamin

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:00 CST