Re: Problem running REMD simulation

From: Peter Freddolino (petefred_at_umich.edu)
Date: Wed Aug 26 2015 - 08:28:57 CDT

Hi Nicholus,
The error message suggests something wrong with your charmrun binary. Did you try running the test programs with it?

Also, you should be using an lrts-enabled namd 2.10 build; I’m not sure why you’re still on NAMD 2.9.

Also, you did not precisely copy the command line that I suggested. The numbers of plusses and spaces matter…
Best,
Peter

> On Aug 26, 2015, at 8:00 AM, Nicholus Bhattacharjee <nicholusbhattacharjee_at_gmail.com> wrote:
>
> Hell Jeff and Peter,
>
> Thank you for the suggestions.
>
> Jeff,
>
> I tried to compile as suggested by you. After the compilation I tried to do the test run as follows and got the following result
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> NAMD_CVS-2015-08-25_Source/Linux-x86_64-g++$ ./namd2 src/alanin
>
> [crim1:08977] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 357
> [crim1:08977] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 230
> [crim1:08977] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../orte/runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value A system-required executable either could not be found or was not executable by this user (-127) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "A system-required executable either could not be found or was not executable by this user" (-127) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [crim1:8977] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Is that fine or I need to do something else. I am attaching the output from compilation.
>
> Peter,
>
> I tried to do as suggested by you and got the following error (normal NAMD is running fine in my computer)
> nicholus_at_crim1:~/replica/example$ mpirun /usr/local/NAMD_2.9_Linux-x86_64-multicore/charmrun /usr/local/NAMD_2.9_Linux-x86_64-multicore/namd2 ++local +p8 +replicas 8 job0.conf +stdout output/%d/job0.%d.log
> --------------------------------------------------------------------------
> Could not execute the executable "/usr/local/NAMD_2.9_Linux-x86_64-multicore/charmrun": Exec format error
>
> This could mean that your PATH or executable name is wrong, or that you do not
> have the necessary permissions. Please ensure that the executable is able to be
> found and executed.
>
> --------------------------------------------------------------------------
> nicholus_at_crim1:~/replica/example$ mpirun /usr/local/NAMD_2.9_Linux-x86_64-multicore/charmrun /usr/local/NAMD_2.9_Linux-x86_64-multicore/namd2 +p8 +replicas 8 job0.conf +stdout output/%d/job0.%d.log
> --------------------------------------------------------------------------
> Could not execute the executable "/usr/local/NAMD_2.9_Linux-x86_64-multicore/charmrun": Exec format error
>
> This could mean that your PATH or executable name is wrong, or that you do not
> have the necessary permissions. Please ensure that the executable is able to be
> found and executed.
>
> --------------------------------------------------------------------------
>
>
>
> On Wed, Aug 26, 2015 at 6:25 AM, Peter Freddolino <petefred_at_umich.edu> wrote:
> Hi Nicholus,
> You ought to be running it through charmrun with this build of namd. For a local run on my laptop I can do the following:
>
> mpirun ~/src/NAMD_2.10_MacOSX-x86_64-netlrts/charmrun ~/src/NAMD_2.10_MacOSX-x86_64-netlrts/namd2 ++local ++p 6 +replicas 6 aslov2_ja_metaRE_run01.namd +stdout output/rep_%d/rep%d_run01.log
>
> (you can change the paths accordingly)
>
> Omitting the charmrun portion yields the same error that you saw.
>
> Best,
> Peter
>
> > On Aug 25, 2015, at 5:59 AM, Nicholus Bhattacharjee <nicholusbhattacharjee_at_gmail.com> wrote:
> >
> > Hello Douglas and Norman,
> >
> > Thanks for the reply. Douglas I have downloaded "NAMD_2.10_Linux-x86_64-netlrts" and tried running the same job with the following command
> >
> > $ mpirun /home/nicholus/softwares/NAMD_2.10_Linux-x86_64-netlrts/namd2 +replicas 8 job0.conf +stdout output/%d/job0.%d.log
> >
> > The resultant error is
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > Charm++: standalone mode (not using charmrun)
> > Charm++> Running in non-SMP mode: numPes 1
> > namd2: machine-common-core.c:863: create_partition_map: Assertion `(_Cmi_numnodes_global % _partitionInfo.numPartitions) == 0' failed.
> > --------------------------------------------------------------------------
> > mpirun noticed that process rank 0 with PID 3679 on node crim1 exited on signal 6 (Aborted).
> > --------------------------------------------------------------------------
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > Could you please tell me what wrong I am doing. Thanks onece again.
> >
> >
> > On Tue, Aug 25, 2015 at 11:16 AM, Douglas Houston <DouglasR.Houston_at_ed.ac.uk> wrote:
> > Hi Nicholus,
> >
> > Are you using a NAMD build based on a patched MPI build of Charm++? I had to compile NAMD from source (by following the instructions in the notes.txt file).
> >
> > Alternatively there is a version on the NAMD download page called "Linux-x86_64-netlrts (Multi-copy algorithms)" which might be what you want (although it's hard to tell).
> >
> > cheers,
> > Doug
> >
> >
> >
> > Quoting Nicholus Bhattacharjee <nicholusbhattacharjee_at_gmail.com> on Tue, 25 Aug 2015 10:01:36 +0200:
> >
> > Hello,
> >
> > I am reposting the query as I have not got any reply.
> >
> > I am trying to run REMD simulation using NAMD. I am following the
> > suggestions given in
> >
> > http://www.ks.uiuc.edu/Research/namd/2.9/ug/node66.html
> >
> > I did the following
> >
> > example$ mkdir output
> > example$ cd output
> > output$ mkdir 0 1 2 3 4 5 6 7
> > output$ cd ..
> > example$ mpirun /usr/local/NAMD_2.9_Linux-x86_64-multicore/namd2 +replicas
> > 8 job0.conf +stdout output/%d/job0.%d.log
> >
> > I got the error message
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%
> > FATAL ERROR: Unknown command-line option +replicas
> > ------------- Processor 0 Exiting: Called CmiAbort ------------
> > Reason: FATAL ERROR: Unknown command-line option +replicas
> >
> > Charm++ fatal error:
> > FATAL ERROR: Unknown command-line option +replicas
> > %%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > The second part of the question is about running REMD using AMBER topology
> > file.
> > For running normal simulation we change in the configuration file
> > "structure XXX.psf" by
> > amber yes
> > parmfile XXX.top"
> >
> > Now in REMD configuration file what we change for the following line
> >
> > set psf_file "XXX.psf"
> >
> >
> >
> >
> >
> > _____________________________________________________
> > Dr. Douglas R. Houston
> > Lecturer
> > Institute of Structural and Molecular Biology
> > Room 3.23, Michael Swann Building
> > King's Buildings
> > University of Edinburgh
> > Edinburgh, EH9 3JR, UK
> > Tel. 0131 650 7358
> > http://tinyurl.com/douglasrhouston
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> >
> >
>
>
> <error>

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:16 CST