Multinode NAMD CUDA GPU Selection

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Aug 18 2011 - 01:25:00 CDT

Hi experts,
yesterday I have observed some, let's say unfavorable behavior of namd cuda
job spawning. I was testing multinode gpu runs when finding out that namd
reads the +devices parameter from the beginning at every node, not process,
just on every node, namd starts to read the devices string from the start
and so make it impossible to work with different nodes. Even if I have
nodelist like:
 
host c35
host c35
host c35
host c35
host c33
host c33
host c33
host c33
 
And type a device string like +devices 1,1,1,1,0,0,0,0 he will try to use
the gpu:1 on all processes. Is there any way to influence this without
hacking the namd source? There have to be a possibility for such things.
Maybe if I have two nodes, one with a quadro and tesla and the other node
only with one tesla, and I only want to use the tesla. Or I just don't want
all gpus, because I want to run multible jobs on one machine. I already have
a script that would give the right gpu id for every node and generate such a
device string. But for that to work, the PEs must determine which gpu to
bind by their real PE-ID, which works fine for one node, but not with
multible nodes. That would be better and no big change to the current
function.
Please tell me your view.
Thanks
Norman Geist

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:43 CST