Re: not getting NAMD multicopy simulation started

From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Wed Nov 25 2020 - 07:45:33 CST

Update:

     I am ONLY able to run both NAMD2.13 and NAMD3alpha7
netlrts-smp-CUDA versions with

         +p2 +replicas 2, i.e. 1 core per replica.

*    But as soon as I use cores more than 1core per replica it fails.*

Anyone ever experienced that?

Any hints are appreciated!

Kind regards

René

On 11/23/2020 2:22 PM, René Hafner TUK wrote:
> Dear all,
>
>
>  I am trying to get an (e)ABF simulation running with multi-copy
> algorithm on a multiGPU node.
>
> I tried as describe in
> http://www.ks.uiuc.edu/Research/namd/2.13/notes.html :
>
>         charmrun ++local  namd2 myconf_file.conf +p16 +replicas 2
> +stdout logfile%d.log
>
>
> I am using the precompiled binaries from the Download page: NAMD 2.13
> Linux-x86_64-netlrts-smp-CUDA (Multi-copy algorithms, single process
> per copy)
>
> And for both NAMD2.13 and NAMD2.14 I get the error:
>
> FATAL ERROR: Number of devices (2) is not a multiple of number of
> processes (8).  Sharing devices between processes is inefficient.
> Specify +ignoresharing (each process uses all visible devices) if not
> all devices are visible to each process, otherwise adjust number of
> processes to evenly divide number of devices, specify subset of
> devices with +devices argument (e.g., +devices 0,2), or multiply list
> shared devices (e.g., +devices 0,1,2,0).
>
> But even with using +devices 0,1 !
>
> I obtain the same error. Why should the number of devices be a
> multiple of the number of processes at all?
>
> Shouldn't it be the otherway around?  8 cores + 1 gpu PER replica for
> my example
>
> Can anyone give me some support here?
>
>
> Kind regards
>
> René Hafner
>

-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:14 CST