From: Chunli Yan (utchunliyan_at_gmail.com)
Date: Sun Dec 13 2020 - 20:47:15 CST
Hello,
NAMD QM/MM parallel runs cross multi nodes:
I wrote a nodelist file into the directory to where the orca runs. Below is
the job submission script:
*#!/bin/bash*
*#SBATCH -A bip174*
*#SBATCH -J test*
*#SBATCH -N 4*
*##SBATCH --tasks-per-node=32*
*##SBATCH --cpus-per-task=1*
*##SBATCH --mem=0*
*#SBATCH -t 48:00:00*
*#module load openmpi/3.1.4*
*export PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/bin/:$PATH"*
*export
LD_LIBRARY_PATH="/ccs/home/chunli/namd-andes/openmpi-3.1.4/lib/:$LD_LIBRARY_PATH"*
*# DIRECTORY TO RUN - $SLURM_SUBMIT_DIR is directory job was submitted from*
*cd $SLURM_SUBMIT_DIR*
*# Generate ORCA nodelist*
*for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do*
* echo "$n slots=20 max-slots=32" >>
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes*
*done*
*sed -i '1d'
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes*
*cd /gpfs/alpine/scratch/chunli/bip174/eABF/run.smd.dft5*
*/ccs/home/chunli/NAMD_2.14_Source/Linux-x86_64-g++/namd2 +p30
+isomalloc_sync decarboxylase.1.conf > output.smd1.log*
I also exclude the first node where NAMD launches to avoid competition
between NAMD and ORCA.
The nodelist is below:
*andes4 slots=20 max-slots=32*
*andes6 slots=20 max-slots=32*
*andes7 slots=20 max-slots=32*
In order to use the host file for mpirun, I edited the runORCA.py:
*cmdline += orcaInFileName + " \"--hostfile
/gpfs/alpine/scratch/chunli/bip174/eABF/smd.qm.dft5/0/qmmm_0.nodes
--bind-to core -nooversubscribe \" " + " > " + orcaOutFileName*
QM methods: B3LYP def2-SVP Grid4 EnGrad SlowConv TightSCF RIJCOSX D3BJ
def2/J
I request 4 nodes total, request 60 cores for ORCA and 20 for NAMD. But the
performance is really bad:
for 48968 total atoms and 32 QM atoms. Below is performance:
*Info: Initial time: 30 CPUs 75.0565 s/step 1737.42 days/ns 2285.66 MB
memory*
*Info: Initial time: 30 CPUs 81.1294 s/step 1877.99 days/ns 2286 MB memory*
*Info: Initial time: 30 CPUs 87.776 s/step 2031.85 days/ns 2286 MB memory*
Can someone help me to find out whether I did something wrong. Or whether
NAMD QM/MM can scale well across the nodes. I checked orca MPI jobs on each
node and found the cpu usage only 50-70%.
The namd was compiled with smp, icc:
./build charm++ verbs-linux-x86_64 icc smp -with-production
./config Linux-x86_64-g++ --charm-arch verbs-linux-x86_64-icc-smp
Thanks.
Best,
*Chunli Yan*
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST