RE Re: Re: problems for runing NAMD on parallel computers

From: xiaojing gong (gongxiaojing1981_at_yahoo.com.cn)
Date: Thu Jun 29 2006 - 23:32:46 CDT

Thank you for your attention,
   I use the mpirun and -machinefile options, but there is error messages, when I use charmrun there is no error message like this, I just want to know are there something should to be noticed in config file when I use charmrun or mpirun.
 The output (if any) follows:
  ------------------------------------------------
FATAL ERROR: ERROR(S) IN THE CONFIGURATION FILE
[0] MPI Abort by user Aborting program !
[0] Aborting program!
/lsf/HPC/exec/rsh: line 46: 17495 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
/lsf/HPC/exec/rsh: line 46: 17506 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
/lsf/HPC/exec/rsh: line 46: 17507 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
/lsf/HPC/exec/rsh: line 46: 17498 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
/lsf/HPC/exec/rsh: line 46: 17538 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
/lsf/HPC/exec/rsh: line 46: 17522 Killed $LSF_BINDIR/lsrun -m $
hostname -P sh -c "$commandfile $TMP_CMD"
Dong Luo <us917_at_yahoo.com> д
  Ok, I noticed your NAMD is a MPI version, that means
there is no need to use charmrun to initiate the job.
Alternately, just use the one available in your MPI
system. And also you may not need to use the script to
find available nodes as the system may do it
automatically. You can check this link:
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/3344.html
In my case, poe is used.

--- xiaojing gong
wrote:

> Thank you for your attention, the following is the
> lsf file, I bsub the lsf file to run my programm.
>
> Dong Luo д Since you are
> using bsub, I think there is no need to
> use nodelist. In your case, NAMD took the first
> option
> as input configuration file, that's not what you
> want.
> Could you show me your submitted job file?
>
> --- xiaojing gong
> wrote:
>
> > Dear all,
> > When I run NAMD on cluster, it will
> > becoming run at one cpu after some time, BUT I
> bsub
> > the job on 8 cpus, and the following is the error
> > messages.
> > can you give me some suggestions?
> > Info: NAMD 2.5 for Linux-amd64-MPI
> > Info:
> > Info: Please visit
> > http://www.ks.uiuc.edu/Research/namd/
> > Info: and send feedback or bug reports to
> > namd_at_ks.uiuc.edu
> > Info:
> > Info: Please cite Kale et al., J. Comp. Phys.
> > 151:283-312 (1999)
> > Info: in all publications reporting results
> obtained
> > with NAMD.
> > Info:
> > Info: Based on Charm++/Converse 0143160 for
> > mpi-linux-gm2-opteron
> > Info: Built Mon Jan 17 21:19:25 CST 2005 by fgf on
> > gbnode002
> > Info: Sending usage information to NAMD developers
> > via UDP. Sent data is:
> > Info: 1 NAMD 2.5 Linux-amd64-MPI 8 ganode002
> > tzhang
> > Info: Running on 8 processors.
> > Info: 2100 kB of memory in use.
> > Measuring processor speeds... Done.
> > Info: Found 3 config files.
> > Info: Configuration file is ++nodelist
> > FATAL ERROR: Simulation config file is not
> > accessible.
> > FATAL ERROR: Simulation config file is not
> > accessible.
> > [0] MPI Abort by user Aborting program !
> > [0] Aborting program!
> >
> >
> >
> > ---------------------------------
> > Ż-3.5G20M
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com
>
>
>
> ---------------------------------
> עŻ-3.5G20M >
APP_NAME=fourcpus
> NP=20
> NP_PER_NODE=4
> RUN="RAW"
>
> rm -f $PWD/namd2.nodelist
>
>
> #start creating .nodelist
> echo 'group main' >$PWD/namd2.nodelist
>
> for i in `echo $LSB_HOSTS`
> do
> echo "host "$i >> $PWD/namd2.nodelist
> done
>
>
> echo
> "------------------------------------------------"
> /home/user/siap/Gong_NAMD_2.6b1_Linux-amd64/charmrun
> /home/user/siap/Gong_NAMD_2.6b1_Linux-amd64/namd2
> +p20 ++nodelist $PWD/namd2.nodelist
>
/home/user/siap/Gong_NAMD_2.6b1_Linux-amd64/13_APRIL/compare_sys_HSE/md1.config>/home/user/siap/Gong_NAMD_2.6b1_Linux-amd64/13_APRIL/compare_sys_HSE/log1.log
>

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

                 
---------------------------------
עŻ-3.5G20M

#pathway
cwd /home/user/siap/Gong_NAMD_2.6b1_Linux-amd64/13_APRIL/compare_sys_HSE/

#forcefield
paratypecharmm on
parameters par_all27_prot_lipid.inp
                                                                 
#molecules
structure ions_added.psf
coordinates ions_added.pdb
bincoordinates release_eq.coor
binvelocities release_eq.vel
extendedSystem release_eq.xsc

set temperature 310

#temp & pressure coupling
langevin on
langevinDamping 5 ;# damping coefficient (gamma) of 5/ps
langevinTemp $temperature
langevinHydrogen no ;# don't couple langevin bath to hydrogens

useGroupPressure yes
useFlexibleCell no
LangevinPiston on
LangevinPistonTarget 1.01325
LangevinPistonPeriod 100
LangevinPistonDecay 50
LangevinPistonTemp $temperature
                                                                 
#output
outputname md1
outputEnergies 100
restartfreq 500
DCDfreq 500
binaryoutput yes
binaryrestart yes
                                      
wrapAll on
wrapWater on
                                                             
#integrator
timestep 2
nonbondedFreq 1
fullElectFrequency 2
stepspercycle 10
                                                             
#approximations
rigidBonds all
rigidTolerance 0.00000001
cutoff 12
switching on
switchdist 10
pairlistdist 13.5
margin 3.0
exclude scaled1-4
1-4scaling 1.0 # 1.0 for Charmm, 0.833333 for Amber
                                       

PME on

PMEGridSizeX 128
PMEGridSizeY 128
PMEGridSizeZ 128
                                                                   

run 5000000
                                                             
                                                             

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:17 CST