Re: AW: Multinode NAMD CUDA GPU Selection

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Aug 19 2011 - 06:33:28 CDT

On Aug 19, 2011, at 1:53 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

> Axel,
>
> What I just mean, sorry for answering doubled, is that I don't understand
> why it can't be implemented like this:
>
> Lets say, and its really likely, namd already generates an array where PEs
> and GPUs are assigned to each other. Why not generate this array like this:
>
> - repeat the +device string till number of entries >= number of PEs
> - then use PE-ID as index for +device list
>
> This would be an absolutely clean solution, which works exactly like the
> current implementation, but adds the possibility to do settings for special
> cases. Don’t get me wrong, I don't want to criticize namd developers, I just
> would have implemented this in that way, because: Why not! It's clean! It's
> working for all possible cases! Does'nt matter if it is a homogenous
> environment or not.
>
> What do you think about that?

The NAMD sources are freely available. Implement your suggestion and
submit it to Jim. It won't make a difference to most people.
Personally, I think you are wasting your time, trying to make up in
software for a bad hardware layout, but that is just me and should not
stop you if you are convinced that your way is worth it. This is the
power of open source software. A lot of good things have come out of
this kind of discussion.

Cheers,

     Axel

>
> Norman Geist.
>
>
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
> von Axel Kohlmeyer
> Gesendet: Donnerstag, 18. August 2011 17:39
> An: Norman Geist
> Cc: Namd Mailing List
> Betreff: Re: namd-l: Multinode NAMD CUDA GPU Selection
>
> norman,
>
> On Thu, Aug 18, 2011 at 2:25 AM, Norman Geist
> <norman.geist_at_uni-greifswald.de> wrote:
>> Hi experts,
>>
>> yesterday I have observed some, let’s say unfavorable behavior of namd
> cuda
>> job spawning. I was testing multinode gpu runs when finding out that namd
>
> the fact, that nodes on a cluster are identical is a common and
> very valid assumption made by many parallel applications. support
> for in-homogeneous machines would make things _much_ more
> complicated for very little gain.
>
>> reads the +devices parameter from the beginning at every node, not
> process,
>> just on every node, namd starts to read the devices string from the start
>> and so make it impossible to work with different nodes. Even if I have
>> nodelist like:
>
> you can try using nvidia-smi to set the GPU that you don't want to use
> for namd to "compute disable" mode. never tried it with namd myself
> (or needed to do it).
>
> axel.
>
>>
>>
>>
>> host c35
>>
>> host c35
>>
>> host c35
>>
>> host c35
>>
>> host c33
>>
>> host c33
>>
>> host c33
>>
>> host c33
>>
>>
>>
>> And type a device string like +devices 1,1,1,1,0,0,0,0 he will try to use
>> the gpu:1 on all processes. Is there any way to influence this without
>> hacking the namd source? There have to be a possibility for such things.
>> Maybe if I have two nodes, one with a quadro and tesla and the other node
>> only with one tesla, and I only want to use the tesla. Or I just don’t
> want
>> all gpus, because I want to run multible jobs on one machine. I already
> have
>> a script that would give the right gpu id for every node and generate such
> a
>> device string. But for that to work, the PEs must determine which gpu to
>> bind by their real PE-ID, which works fine for one node, but not with
>> multible nodes. That would be better and no big change to the current
>> function.
>>
>> Please tell me your view.
>>
>> Thanks
>>
>> Norman Geist
>
>
>
> --
> Dr. Axel Kohlmeyer
> akohlmey_at_gmail.com http://goo.gl/1wk0
>
> Institute for Computational Molecular Science
> Temple University, Philadelphia PA, USA.
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:36 CST