Re: NAMD hangs with replica option

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Mon Jan 10 2022 - 10:22:13 CST

Next message: jing liang: "Re: NAMD hangs with replica option"
Previous message: Josh Vermaas: "Re: NAMD hangs with replica option"
In reply to: Josh Vermaas: "Re: NAMD hangs with replica option"
Next in thread: jing liang: "Re: NAMD hangs with replica option"
Reply: jing liang: "Re: NAMD hangs with replica option"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Jing, you're probably using different values for outputName if you're using
multipleReplicas on (i.e. multiple walkers), but still, please confirm that
that's what you are using.

Note also that by using file-based communication the replicas don't need to
be launched with the same command, but can also be run as independent jobs:
https://urldefense.com/v3/__https://colvars.github.io/colvars-refman-namd/colvars-refman-namd.html*sec:colvarbias_meta_mr__;Iw!!DZ3fjg!rF5NwjvTUIgpOEE1FbWVTMYMoa0feSCDnSu13HLh3dKfZfsEHsbeIaEVo3mXoaNZ2Q$
In that framework, the main advantage of +replicas is mostly that the value
of replicaID is filled automatically, so that your Colvars config file can
be identical for all replicas.

If you are experiencing file I/O issues also when launching replicas
independently (i.e. not with a single NAMD run with +replicas), can you
find out what kind of filesystem you have on the compute nodes?

Thanks
Giacomo

On Mon, Jan 10, 2022 at 9:37 AM Josh Vermaas <vermaasj_at_msu.edu> wrote:

> There is definitely a bug in the 2.14 MPI version. One of my students
> has noticed that anything that calls NAMD die isn't taking down all the
> replicas, and so the jobs will continue to burn resources until they
> reach their wallclock limit.
>
> However, the key is figuring out *why* you are getting an error. I'm
> less familiar with metadynamics, but at least for umbrella sampling, it
> is pretty typical for each replica to write out its own set of files.
> This is usually done with something like:
>
> outputname somename.[myReplica]
>
> Where [myReplica] is a Tcl function that evaluates to the replica ID for
> each semi-independent simulation. For debugging purposes, it can be very
> helpful for each replica to spit out its own log file. This is usually
> done by setting the +stdout option on the command line.
>
> mpirun -np 28 namd2 +replicas 2 namd_metadynamics.inp +stdout
> outputlog.%d.log
>
> -Josh
>
> On 1/9/22 2:34 PM, jing liang wrote:
> > Hi,
> >
> > I am running a metadynamics simulation with NAMD 2.14 MPI version.
> > SLURM is being used for job scheduling, the way to run it by using 2
> > replica on a 14 cores node is as follows:
> >
> > mpirun -np 28 namd2 +replicas 2 namd_metadynamics.inp
> >
> > In fact, I have tried upto 8 replicas and the resulting pmf looks very
> > similar
> > to what I obtain with other methods such as ABF. The problem is that
> > by using
> > the replicas option, the simulation hangs right at the end. I have
> > looked at the
> > output files and it seems that right at the end NAMD wants to access
> > some files
> > (for example, *.xsc, *hills*, ...) that already exist and NAMD throws
> > an error.
> >
> > My guess is that this could be either a misunderstanding from my side
> > in running NAMD with replicas or a bug in the MPI version.
> >
> > Have you observed that issue previously? Any comment is welcome. Thanks
> >
>
> --
> Josh Vermaas
>
> vermaasj_at_msu.edu
> Assistant Professor, Plant Research Laboratory and Biochemistry and
> Molecular Biology
> Michigan State University
>
> https://urldefense.com/v3/__https://prl.natsci.msu.edu/people/faculty/josh-vermaas/__;!!DZ3fjg!qxoAM7sAMD7OOX4XekBXNyDSDwyL5GBEa1rt9qiV-ok0frmrn27DsCUvWPCFTfWyyQ$
>
>
>

Next message: jing liang: "Re: NAMD hangs with replica option"
Previous message: Josh Vermaas: "Re: NAMD hangs with replica option"
In reply to: Josh Vermaas: "Re: NAMD hangs with replica option"
Next in thread: jing liang: "Re: NAMD hangs with replica option"
Reply: jing liang: "Re: NAMD hangs with replica option"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST