AW: Automatic GPU selection in NAMD ?

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed May 18 2011 - 00:24:45 CDT

Hi,

I had the same problem while using the SunGridEngine which is only able to
consume cpus. If you use such a queing system too, you can just write a
little script, and call it with a single command within the job script. I
just wrote a script which reads a "Node".data file where two values are
applied: CPU=4 GPU=2. This script then creates a file "Node.alloc" which is
created like this:

---------------------------
ALLOC 0 1 //which gpus are used
JOBID 1234 0 //The job and the gpu it uses
STAMP 1234 12.04.2010 //When the job got started
JOBID 1235 1
STAMP 1235 15.04.2010
---------------------------

The GPU count is linked to the number of cores used on the node, then for
every CPUs/GPUs CPUs, one GPU is allocated. This brins the advantage, that
the job still remains in the queue because if theres not enough cpus, there
is no gpu free. The script writes the alloc file and you always know which
gpus are in use. A second script is called at the end of the job script to
unallocated the gpus. Also the allocater checks at runtime if all jobs in
the alloc file are still running, if not, he frees the gpus, that happens in
cases where the script got killed, without the unallocator to be called.

So its just easy:

1. Data file for the nodes.
2. alloc file for the nodes.
3. allocater script -> alloc & unnalloc old entrys
4. unallocater script

Best regards

Norman Geist.

-----Ursprüngliche Nachricht-----
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Axel Kohlmeyer
Gesendet: Dienstag, 17. Mai 2011 14:20
An: David McGiven
Cc: Jesper Sørensen; namd-l
Betreff: Re: namd-l: Automatic GPU selection in NAMD ?

david

2011/5/17 David McGiven <davidmcgivenn_at_gmail.com>:
> Hi Jesper,
>
> But I don't want to use all the avaliable gpu's. I want to use only 1 or 2
> out of 4, but not knowing which of them are free.

you already know my position on how much performance
is tainted by having multiple jobs enter a machine and
compete for resources, so i won't iterate that again.

> I am wondering if there's a simple solution for that or if I will have to
> create a script that parses through nvidia-smi output and crafts a
+devices
> argument with the avaliable GPU's. Not easy, nor elegant solution though.

this is usually handled by writing a prologue/epilogue script that
associates GPUs with a job by using lock files. it is easy to do
with programs like HOOMD that utilize only a single GPU per job.
to make this more flexible requires better support from the batch
system. if i remember correctly, very recent torque versions allow
to define the number of GPUs per node and set environment variables
according to the gpu requirements given to the nodes flag.

of course this requires full cooperation from the user.

axel.

>
> D.
>
>
> On 17/05/2011, at 12:41, Jesper Sørensen wrote:
>
>> Hi David,
>>
>> I meant use:
>> namd ++idlepoll [...]
>>
>> Do not add the +devices flag. I assume that the way it works is that if
>> you
>> don't add it, NAMD will default to read "+devices all"
>>
>> Best,
>> Jesper
>>
>>
>> -----Original Message-----
>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On
Behalf
>> Of David McGiven
>> Sent: 17. maj 2011 12:34
>> To: Jesper Sørensen
>> Cc: 'namd-l'
>> Subject: Re: namd-l: Automatic GPU selection in NAMD ?
>>
>> Thanks Jesper,
>>
>> It's strange ...
>>
>> If I use : namd ++idplepoll [...] +devices 0
>>
>> It works.
>>
>> If I use : namd ++idplepoll [...] +devices x
>>
>> It doesn't : Fatal error on PE 2> FATAL ERROR: CUDA error on Pe 2 (alf
>> device 0): All CUDA devices are in prohibited mode, of compute capability
>> 1.0, or otherwise unusable.
>>
>> I, of course, have checked the COMPUTE mode ruleset for the devices, and
>> it's in 0, Default. I have set the two last GPU's to 1 and 2, so NAMD has
>> all kinds of compute ruleset GPU's to chose from, but it still gives the
>> same error.
>>
>> # nvidia-smi -s
>> COMPUTE mode rules for GPU 0: 0
>> COMPUTE mode rules for GPU 1: 0
>> COMPUTE mode rules for GPU 2: 1
>> COMPUTE mode rules for GPU 3: 2
>>
>> Do you know what might be happening ?
>>
>> Thanks.
>>
>> D.
>>
>>
>> On 17/05/2011, at 12:05, Jesper Sørensen wrote:
>>
>>> Hi David,
>>>
>>> If you just leave out the "+devices  x,x,x" it will use any available
>>> GPU's on the nodes that you have requested, at least that works for us
>>> and we have
>>> 3 GPU's per node. I will say that we have compiled NAMD with MPI and I
>>> don't know if that makes a difference.
>>>
>>> Best,
>>> Jesper
>>>
>>>
>>> -----Original Message-----
>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On
>>> Behalf Of David McGiven
>>> Sent: 17. maj 2011 11:59
>>> To: namd-l
>>> Subject: namd-l: Automatic GPU selection in NAMD ?
>>>
>>> Dear NAMD Users,
>>>
>>> Is there any way to tell NAMD to use a number of GPU's, but not the
>>> specific device numbers ?
>>>
>>> Normally one would do :
>>>
>>> namd ++idplepoll [...] +devices 0,1 (for 2 GPU's) or namd +
>>> +idplepoll [...]
>>> +devices 0,1,2,3 (for 4 GPU's)
>>>
>>> But I would like not to state the specific device numbers, but rather,
>>> just "use 2 GPU's".
>>>
>>> This is because the cluster users do not know which GPU's are in use
>>> or not.
>>>
>>> I know the best setup is to have one GPU per machine, and then one
>>> avoids this kind of problems. But it's now like that and I would like
>>> to know if someone has found any simple solution, or if NAMD has a
>>> command for that (I Haven't found it in documentation).
>>>
>>> Thanks.
>>>
>>> Regards,
>>> D.
>>>
>>
>
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:08 CST