NAMD QM/MM multi nodes performance bad

From: Chunli Yan (utchunliyan_at_gmail.com)
Date: Sun Dec 13 2020 - 20:47:15 CST

Hello,
NAMD QM/MM parallel runs cross multi nodes:
I wrote a nodelist file into the directory to where the orca runs. Below is
the job submission script:

*#!/bin/bash*

*#SBATCH -A bip174*

*#SBATCH -J test*

*#SBATCH -N 4*

*##SBATCH --tasks-per-node=32*

*##SBATCH --cpus-per-task=1*

*##SBATCH --mem=0*

*#SBATCH -t 48:00:00*

*#module load openmpi/3.1.4*

*export PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/bin/:$PATH"*

*export
LD_LIBRARY_PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/lib/:$LD_LIBRARY_PATH"*

*# DIRECTORY TO RUN - $SLURM_SUBMIT_DIR is directory job was submitted from*

*cd $SLURM_SUBMIT_DIR*

*# Generate ORCA nodelist*

*for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do*

* echo "$n slots=20 max-slots=32" >>
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes*

*done*

*sed -i '1d'
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes*

*cd /gpfs/alpine/scratch/chunli/bip174/eABF/run.smd.dft5*

*/ccs/home/chunli/NAMD_2.14_Source/Linux-x86_64-g++/namd2 +p30
+isomalloc_sync decarboxylase.1.conf > output.smd1.log*

I also exclude the first node where NAMD launches to avoid competition
between NAMD and ORCA.
The nodelist is below:

*andes4 slots=20 max-slots=32*

*andes6 slots=20 max-slots=32*

*andes7 slots=20 max-slots=32*

In order to use the host file for mpirun, I edited the runORCA.py:

*cmdline += orcaInFileName + " \"--hostfile
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes
--bind-to core -nooversubscribe \" " + " > " + orcaOutFileName*

QM methods: B3LYP def2-SVP Grid4 EnGrad SlowConv TightSCF RIJCOSX D3BJ
def2/J

I request 4 nodes total, request 60 cores for ORCA and 20 for NAMD. But the
performance is really bad:
for 48968 total atoms and 32 QM atoms. Below is performance:

*Info: Initial time: 30 CPUs 75.0565 s/step 1737.42 days/ns 2285.66 MB
memory*

*Info: Initial time: 30 CPUs 81.1294 s/step 1877.99 days/ns 2286 MB memory*

*Info: Initial time: 30 CPUs 87.776 s/step 2031.85 days/ns 2286 MB memory*

Can someone help me to find out whether I did something wrong. Or whether
NAMD QM/MM can scale well across the nodes. I checked orca MPI jobs on each
node and found the cpu usage only 50-70%.

The namd was compiled with smp, icc:
./build charm++ verbs-linux-x86_64 icc smp -with-production
./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-icc-smp

Thanks.

Best,

*Chunli Yan*

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:15 CST