Re: namd QM/MM on oak ridge ANDES cluster

From: Marcelo C. R. Melo (melomcr_at_gmail.com)
Date: Thu Dec 10 2020 - 11:06:02 CST

Hi Chunli,

The first thing to make sure of when using ORCA in multiple cores (which
necessarily uses MPI) is to use NAMD's "multicore" build without CUDA. In
this case, you cannot run NAMD across multiple nodes, and will not call
NAMD using "mpirun" or similar. In terms of performance, this should not be
an issue since a DFT calculation that requires so much compute power will
likely be much more time consuming than anything else NAMD has to do. From
your script, your "srun" command is calling for 320 cores, which likely
means you are trying to launch NAMD over many nodes. This probably is
creating problems for you.

The next step is to make sure your cluster is making other nodes available
to ORCA. By default, ORCA will only detect the cores from the node where
NAMD is launched (which is where NAMD will launch ORCA). If you want to use
other cores from other nodes, you will need to get that information from
your cluster's resource allocation system (for example, slurm). I have done
this in the past by getting the "nodes list" file from slurm and writing it
to the directory where ORCA is launched (which would be, in your case, the
subdirectories within "/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft").
However, every cluster has its own peculiarities and I am not sure how
ANDES works, so you should probably check with the sys admins there.

Let me know how that works out.
Best,
Marcelo

On Sun, 6 Dec 2020 at 20:15, Chunli Yan <utchunliyan_at_gmail.com> wrote:

> I ran into problems to run namd QM/MM.
>
> The job always died by following information:
>
> ORCA finished by error termination in GTOInt
> Calling Command: mpirun -np 320
> /ccs/home/chunli/program/orca421/orca_gtoint_mpi
> /gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft/0/qmmm_0.input.int.tmp
> /gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft/0/qmmm_0.input
> [file orca_tools/qcmsg.cpp, line 458]:
> .... aborting the run
>
> It seems that I can not parallel ORCA.
>
> If I remove "PAL8" not to parallel ORCA, the job was running without
> problems.
>
> Please let me know why I can not call mpirun. I really need your help.
>
> The following is my job script for submission:
>
> #!/bin/bash
> #SBATCH -A bip174
> #SBATCH -J test
> #SBATCH -N 10
> #SBATCH -t 48:00:00
>
> #module load cuda/9.2.148
> module load fftw
> module unload openmpi
> #module load spectrum-mpi
> #module list
>
> #export ORCA_DIR="/ccs/home/chunli/program/orca421"
> #export PATH="$PATH:$ORCA_DIR"
>
> export PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/bin/:$PATH"
> export
> LD_LIBRARY_PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/lib/:$LD_LIBRARY_PATH"
>
> cd /gpfs/alpine/scratch/chunli/bip174/eABF/run.smd.dft
> srun -n 320
> /ccs/home/chunli/namd-andes/NAMD_2.14_Source/Linux-x86_64-g++/namd2
> decarboxylase.1.conf > output.smd1.log
>
>
>
> #NAMD CONFIGURATION FILE FOR SMD 1
>
> #initial config
> coordinates decarboxylase.0.pdb
> extendedsystem decarboxylase.0r.xsc
> bincoordinates decarboxylase.0r.coor
> temperature 300
> seed 12345
>
> # Reaction
> qmcSMD on
> qmcSMDfile reaction_1.rct
>
> #output params
> binaryoutput no
> outputname decarboxylase.1
> outputenergies 1
> outputtiming 1
> outputpressure 1
> binaryrestart yes
> dcdfile decarboxylase.1.dcd
> dcdfreq 1
> XSTFreq 1
> restartname decarboxylase.1r
> restartfreq 1
>
> #pme parameters
> PME on
> PMETolerance 10e-6
> PMEInterpOrder 4
> PMEGridspacing 1
>
> #temperature control and equilibration
> langevin on
> langevintemp 300
> langevindamping 200
>
> #pressure control
> usegrouppressure yes
> useflexiblecell no
> useConstantArea no
> langevinpiston off
> langevinpistontarget 1
> langevinpistonperiod 200
> langevinpistondecay 100
> langevinpistontemp 300
> surfacetensiontarget 0.0
> strainrate 0. 0. 0.
>
> #brnch_root_list_opt
> splitpatch hydrogen
> hgroupcutoff 2.8
>
> #integrator params
> timestep 0.5
> firstTimestep 0
> fullElectFrequency 1
> nonbondedfreq 1
>
> #force field params
> structure decarboxylase.0.psf
> paratypecharmm on
> parameters ../toppar/par_all36_prot.prm
> parameters ../toppar/par_all36_cgenff.prm
> parameters ../toppar/par_all36_na.prm
> parameters ../toppar/par_all36_carb.prm
> parameters ../toppar/par_all36_lipid.prm
> parameters ../toppar/toppar_water_ions_namd.str
> parameters ../toppar/ligand.str
> exclude scaled1-4
> 1-4scaling 1.0
> rigidbonds all
> rigidtolerance 0.00001
> rigiditerations 400
> cutoff 14.0
> pairlistdist 15.0
> stepspercycle 1
> switching on
> switchdist 12.0
>
> wrapAll on
>
> ############################################
> ################################ QM STUFF ##
> ############################################
> qmForces on
> qmColumn beta
> QMSimsPerNode 1
> qmBondColumn occ
> QMBondScheme CS
> QMSwitching on
> QMSwitchingType shift
> QMPointChargeScheme none
> qmBaseDir "/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft"
> QMVdWParams off
> QMNoPntChrg off
> QMPCStride 1
> QMLiveSolventSel off
> qmConfigLine "! B3LYP 6-31G Grid4 EnGrad TightSCF PAL8"
> #qmConfigLine "%%pal nprocs 320 end"
> qmConfigLine "%%output PrintLevel Mini Print\[ P_Mulliken \] 1
> Print\[P_AtCharges_M\] 1 end"
> QMChargeFromPSF on
> qmMult "1 1"
> qmSoftware orca
> qmExecPath "/ccs/home/chunli/program/orca421/orca"
> QMOutStride 1
> QMPositionOutStride 1
> ############################################
> ######################### END OF QM STUFF ##
> ############################################
>
> #script
> minimize 500
> reinitvels 300
> run 10000
> Top <https://orcaforum.kofo.mpg.de/viewtopic.php?f=8&t=6826#top>
> <https://orcaforum.kofo.mpg.de/memberlist.php?mode=viewprofile&u=8277>
>
> Best,
>
>
> *Chunli*
>
>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:15 CST