Re: Automatic GPU selection in NAMD ?

From: David McGiven (davidmcgivenn_at_gmail.com)
Date: Thu May 19 2011 - 04:42:39 CDT

Hi Jim,

You are right. This is the problem I'm having : Setting the GPU's to
compute exclusive is usless because NAMD needs a minimum of 2-4 cores
per GPU to work properly. At least with my setup and apoa1.namd.

I hope I will solve it coding a script that checks for busy GPU's with
nvidia-smi and stuff like that. As Axel noted before, it will need
some cooperation from the user.

I think MOAB scheduler works with GPU's, but the free version, MAUI,
doesn't. Too bad. I don't think it should be too difficult to add it.
Too bad NAMD always requires you to specify the GPU id's you want to
use, also.

Regards
D.

On 18/05/2011, at 20:22, Jim Phillips wrote:

> Hi David,
>
> NAMD will use a single GPU per process. That GPU will be shared
> with other NAMD processes on the same node if there are not enough
> GPUs for each process to have its own GPU.
>
> How do you know if a GPU is free or busy? If you're running in
> exclusive mode then the GPUs can not be shared between processes
> anyway so you can just launch one NAMD process per GPU.
>
> You really need the scheduler to assign GPUs to jobs if you want to
> have multiple jobs on the same multi-GPU node. I'm planning to do
> this in Grid Engine by having single-slot gpu0 and gpu1 queues on
> each dual-GPU node and parsing the QUEUE and PE_HOSTFILE environment
> within the job script to determine which queues were assigned and
> hence which devices to use.
>
> -Jim
>
>
> On Tue, 17 May 2011, David McGiven wrote:
>
>> Hi Jesper,
>>
>> But I don't want to use all the avaliable gpu's. I want to use only
>> 1 or 2 out of 4, but not knowing which of them are free.
>>
>> I am wondering if there's a simple solution for that or if I will
>> have to create a script that parses through nvidia-smi output and
>> crafts a +devices argument with the avaliable GPU's. Not easy, nor
>> elegant solution though.
>>
>> D.
>>
>>
>> On 17/05/2011, at 12:41, Jesper Sørensen wrote:
>>
>>> Hi David,
>>> I meant use:
>>> namd ++idlepoll [...]
>>> Do not add the +devices flag. I assume that the way it works is
>>> that if you
>>> don't add it, NAMD will default to read "+devices all"
>>> Best,
>>> Jesper
>>> -----Original Message-----
>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
>>> On Behalf
>>> Of David McGiven
>>> Sent: 17. maj 2011 12:34
>>> To: Jesper Sørensen
>>> Cc: 'namd-l'
>>> Subject: Re: namd-l: Automatic GPU selection in NAMD ?
>>> Thanks Jesper,
>>> It's strange ...
>>> If I use : namd ++idplepoll [...] +devices 0
>>> It works.
>>> If I use : namd ++idplepoll [...] +devices x
>>> It doesn't : Fatal error on PE 2> FATAL ERROR: CUDA error on Pe 2
>>> (alf
>>> device 0): All CUDA devices are in prohibited mode, of compute
>>> capability
>>> 1.0, or otherwise unusable.
>>> I, of course, have checked the COMPUTE mode ruleset for the
>>> devices, and
>>> it's in 0, Default. I have set the two last GPU's to 1 and 2, so
>>> NAMD has
>>> all kinds of compute ruleset GPU's to chose from, but it still
>>> gives the
>>> same error.
>>> # nvidia-smi -s
>>> COMPUTE mode rules for GPU 0: 0
>>> COMPUTE mode rules for GPU 1: 0
>>> COMPUTE mode rules for GPU 2: 1
>>> COMPUTE mode rules for GPU 3: 2
>>> Do you know what might be happening ?
>>> Thanks.
>>> D.
>>> On 17/05/2011, at 12:05, Jesper Sørensen wrote:
>>>> Hi David,
>>>> If you just leave out the "+devices x,x,x" it will use any
>>>> available
>>>> GPU's on the nodes that you have requested, at least that works
>>>> for us
>>>> and we have
>>>> 3 GPU's per node. I will say that we have compiled NAMD with MPI
>>>> and I
>>>> don't know if that makes a difference.
>>>> Best,
>>>> Jesper
>>>> -----Original Message-----
>>>> From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On
>>>> Behalf Of David McGiven
>>>> Sent: 17. maj 2011 11:59
>>>> To: namd-l
>>>> Subject: namd-l: Automatic GPU selection in NAMD ?
>>>> Dear NAMD Users,
>>>> Is there any way to tell NAMD to use a number of GPU's, but not the
>>>> specific device numbers ?
>>>> Normally one would do :
>>>> namd ++idplepoll [...] +devices 0,1 (for 2 GPU's) or namd +
>>>> +idplepoll [...]
>>>> +devices 0,1,2,3 (for 4 GPU's)
>>>> But I would like not to state the specific device numbers, but
>>>> rather,
>>>> just "use 2 GPU's".
>>>> This is because the cluster users do not know which GPU's are in
>>>> use
>>>> or not.
>>>> I know the best setup is to have one GPU per machine, and then one
>>>> avoids this kind of problems. But it's now like that and I would
>>>> like
>>>> to know if someone has found any simple solution, or if NAMD has a
>>>> command for that (I Haven't found it in documentation).
>>>> Thanks.
>>>> Regards,
>>>> D.
>>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:17 CST