Re: Automatic GPU selection in NAMD ?

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Wed May 18 2011 - 13:22:57 CDT

Hi David,

NAMD will use a single GPU per process. That GPU will be shared with
other NAMD processes on the same node if there are not enough GPUs for
each process to have its own GPU.

How do you know if a GPU is free or busy? If you're running in exclusive
mode then the GPUs can not be shared between processes anyway so you can
just launch one NAMD process per GPU.

You really need the scheduler to assign GPUs to jobs if you want to have
multiple jobs on the same multi-GPU node. I'm planning to do this in Grid
Engine by having single-slot gpu0 and gpu1 queues on each dual-GPU node
and parsing the QUEUE and PE_HOSTFILE environment within the job script to
determine which queues were assigned and hence which devices to use.

-Jim

On Tue, 17 May 2011, David McGiven wrote:

> Hi Jesper,
>
> But I don't want to use all the avaliable gpu's. I want to use only 1 or 2
> out of 4, but not knowing which of them are free.
>
> I am wondering if there's a simple solution for that or if I will have to
> create a script that parses through nvidia-smi output and crafts a +devices
> argument with the avaliable GPU's. Not easy, nor elegant solution though.
>
> D.
>
>
> On 17/05/2011, at 12:41, Jesper Sørensen wrote:
>
>> Hi David,
>>
>> I meant use:
>> namd ++idlepoll [...]
>>
>> Do not add the +devices flag. I assume that the way it works is that if you
>> don't add it, NAMD will default to read "+devices all"
>>
>> Best,
>> Jesper
>>
>>
>> -----Original Message-----
>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf
>> Of David McGiven
>> Sent: 17. maj 2011 12:34
>> To: Jesper Sørensen
>> Cc: 'namd-l'
>> Subject: Re: namd-l: Automatic GPU selection in NAMD ?
>>
>> Thanks Jesper,
>>
>> It's strange ...
>>
>> If I use : namd ++idplepoll [...] +devices 0
>>
>> It works.
>>
>> If I use : namd ++idplepoll [...] +devices x
>>
>> It doesn't : Fatal error on PE 2> FATAL ERROR: CUDA error on Pe 2 (alf
>> device 0): All CUDA devices are in prohibited mode, of compute capability
>> 1.0, or otherwise unusable.
>>
>> I, of course, have checked the COMPUTE mode ruleset for the devices, and
>> it's in 0, Default. I have set the two last GPU's to 1 and 2, so NAMD has
>> all kinds of compute ruleset GPU's to chose from, but it still gives the
>> same error.
>>
>> # nvidia-smi -s
>> COMPUTE mode rules for GPU 0: 0
>> COMPUTE mode rules for GPU 1: 0
>> COMPUTE mode rules for GPU 2: 1
>> COMPUTE mode rules for GPU 3: 2
>>
>> Do you know what might be happening ?
>>
>> Thanks.
>>
>> D.
>>
>>
>> On 17/05/2011, at 12:05, Jesper Sørensen wrote:
>>
>>> Hi David,
>>>
>>> If you just leave out the "+devices x,x,x" it will use any available
>>> GPU's on the nodes that you have requested, at least that works for us
>>> and we have
>>> 3 GPU's per node. I will say that we have compiled NAMD with MPI and I
>>> don't know if that makes a difference.
>>>
>>> Best,
>>> Jesper
>>>
>>>
>>> -----Original Message-----
>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On
>>> Behalf Of David McGiven
>>> Sent: 17. maj 2011 11:59
>>> To: namd-l
>>> Subject: namd-l: Automatic GPU selection in NAMD ?
>>>
>>> Dear NAMD Users,
>>>
>>> Is there any way to tell NAMD to use a number of GPU's, but not the
>>> specific device numbers ?
>>>
>>> Normally one would do :
>>>
>>> namd ++idplepoll [...] +devices 0,1 (for 2 GPU's) or namd +
>>> +idplepoll [...]
>>> +devices 0,1,2,3 (for 4 GPU's)
>>>
>>> But I would like not to state the specific device numbers, but rather,
>>> just "use 2 GPU's".
>>>
>>> This is because the cluster users do not know which GPU's are in use
>>> or not.
>>>
>>> I know the best setup is to have one GPU per machine, and then one
>>> avoids this kind of problems. But it's now like that and I would like
>>> to know if someone has found any simple solution, or if NAMD has a
>>> command for that (I Haven't found it in documentation).
>>>
>>> Thanks.
>>>
>>> Regards,
>>> D.
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:17 CST