Re: Automatic GPU selection in NAMD ?

From: David McGiven (davidmcgivenn_at_gmail.com)
Date: Tue May 17 2011 - 09:35:57 CDT

Hey Axel!

On 17/05/2011, at 14:19, Axel Kohlmeyer wrote:

> david
>
> 2011/5/17 David McGiven <davidmcgivenn_at_gmail.com>:
>> Hi Jesper,
>>
>> But I don't want to use all the avaliable gpu's. I want to use only
>> 1 or 2
>> out of 4, but not knowing which of them are free.
>
> you already know my position on how much performance
> is tainted by having multiple jobs enter a machine and
> compete for resources, so i won't iterate that again.
>

Yes, but you know right now we can't do much about it. Users are
desperately craving for their GPU slice and until we add more
resources to use the Tesla properly, we have to use it in that way.

>> I am wondering if there's a simple solution for that or if I will
>> have to
>> create a script that parses through nvidia-smi output and crafts a
>> +devices
>> argument with the avaliable GPU's. Not easy, nor elegant solution
>> though.
>
> this is usually handled by writing a prologue/epilogue script that
> associates GPUs with a job by using lock files. it is easy to do
> with programs like HOOMD that utilize only a single GPU per job.
> to make this more flexible requires better support from the batch
> system. if i remember correctly, very recent torque versions allow
> to define the number of GPUs per node and set environment variables
> according to the gpu requirements given to the nodes flag.
>
> of course this requires full cooperation from the user.
>

Thanks man. I was just hoping there was an easier solution.

Best Regards,
D.

> axel.
>
>>
>> D.
>>
>>
>> On 17/05/2011, at 12:41, Jesper Sørensen wrote:
>>
>>> Hi David,
>>>
>>> I meant use:
>>> namd ++idlepoll [...]
>>>
>>> Do not add the +devices flag. I assume that the way it works is
>>> that if
>>> you
>>> don't add it, NAMD will default to read "+devices all"
>>>
>>> Best,
>>> Jesper
>>>
>>>
>>> -----Original Message-----
>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
>>> On Behalf
>>> Of David McGiven
>>> Sent: 17. maj 2011 12:34
>>> To: Jesper Sørensen
>>> Cc: 'namd-l'
>>> Subject: Re: namd-l: Automatic GPU selection in NAMD ?
>>>
>>> Thanks Jesper,
>>>
>>> It's strange ...
>>>
>>> If I use : namd ++idplepoll [...] +devices 0
>>>
>>> It works.
>>>
>>> If I use : namd ++idplepoll [...] +devices x
>>>
>>> It doesn't : Fatal error on PE 2> FATAL ERROR: CUDA error on Pe 2
>>> (alf
>>> device 0): All CUDA devices are in prohibited mode, of compute
>>> capability
>>> 1.0, or otherwise unusable.
>>>
>>> I, of course, have checked the COMPUTE mode ruleset for the
>>> devices, and
>>> it's in 0, Default. I have set the two last GPU's to 1 and 2, so
>>> NAMD has
>>> all kinds of compute ruleset GPU's to chose from, but it still
>>> gives the
>>> same error.
>>>
>>> # nvidia-smi -s
>>> COMPUTE mode rules for GPU 0: 0
>>> COMPUTE mode rules for GPU 1: 0
>>> COMPUTE mode rules for GPU 2: 1
>>> COMPUTE mode rules for GPU 3: 2
>>>
>>> Do you know what might be happening ?
>>>
>>> Thanks.
>>>
>>> D.
>>>
>>>
>>> On 17/05/2011, at 12:05, Jesper Sørensen wrote:
>>>
>>>> Hi David,
>>>>
>>>> If you just leave out the "+devices x,x,x" it will use any
>>>> available
>>>> GPU's on the nodes that you have requested, at least that works
>>>> for us
>>>> and we have
>>>> 3 GPU's per node. I will say that we have compiled NAMD with MPI
>>>> and I
>>>> don't know if that makes a difference.
>>>>
>>>> Best,
>>>> Jesper
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On
>>>> Behalf Of David McGiven
>>>> Sent: 17. maj 2011 11:59
>>>> To: namd-l
>>>> Subject: namd-l: Automatic GPU selection in NAMD ?
>>>>
>>>> Dear NAMD Users,
>>>>
>>>> Is there any way to tell NAMD to use a number of GPU's, but not the
>>>> specific device numbers ?
>>>>
>>>> Normally one would do :
>>>>
>>>> namd ++idplepoll [...] +devices 0,1 (for 2 GPU's) or namd +
>>>> +idplepoll [...]
>>>> +devices 0,1,2,3 (for 4 GPU's)
>>>>
>>>> But I would like not to state the specific device numbers, but
>>>> rather,
>>>> just "use 2 GPU's".
>>>>
>>>> This is because the cluster users do not know which GPU's are in
>>>> use
>>>> or not.
>>>>
>>>> I know the best setup is to have one GPU per machine, and then one
>>>> avoids this kind of problems. But it's now like that and I would
>>>> like
>>>> to know if someone has found any simple solution, or if NAMD has a
>>>> command for that (I Haven't found it in documentation).
>>>>
>>>> Thanks.
>>>>
>>>> Regards,
>>>> D.
>>>>
>>>
>>
>>
>>
>
>
>
> --
> Dr. Axel Kohlmeyer
> akohlmey_at_gmail.com http://goo.gl/1wk0
>
> Institute for Computational Molecular Science
> Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:16 CST