Re: question to run namd2 with multiple CPUs

From: Victor Kwan (vkwan8_at_uwo.ca)
Date: Thu Mar 05 2020 - 18:33:23 CST

Hi Yu,

First of all, if you are trying to run the ibverbs build without mpi (InfiniBand plus shared memory, no MPI needed), you should not be using mpiexec.

You can find the recommended runscript for ibverbs-smp from https://docs.scinet.utoronto.ca/index.php/NAMD , provided by SciNet / Compute Canada

Compute Canada also provides runscripts for MPI, ibverbs with no smp and UCX.

https://docs.computecanada.ca/wiki/NAMD#Verbs_jobs

Obviously some modification will be needed - slurm_hl2hl.py is an in house script that generates a nodelist for ibverbs build that does not use MPI

Kind regards,

Victor Kwan





On Fri, Feb 28, 2020 at 5:12 PM Yu, Tao <tao.yu.1_at_und.edu<mailto:tao.yu.1_at_und.edu>> wrote:
Hi,

Our university just installed the Linux-x86_64-ibverbs-smp<https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1546> (InfiniBand plus shared memory, no MPI needed) on our local cluster. There is no problem to run with a single core. But when I started to test running with 16 cores, the output indicated it was still using "1" cpu.

I tried to submit the job either with or without mpirun, but the result is the same. One 1 cpu was used.

Please give me some help here.

The script I was using was attached in below

#!/bin/bash
#####Number of nodes
#SBATCH --nodes=1
#SBATCH --partition=talon
#SBATCH --ntasks-per-node=16
#SBATCH --workdir=.
#####SBATCH -o slurm_run_%j_output.txt
#####SBATCH -e slurm_run_%j_error.txt
#SBATCH -s
#SBATCH --time=00:10:00

cd $SLURM_SUBMIT_DIR
srun -n $SLURM_NTASKS hostname | sort -u > $SLURM_JOB_ID.hosts

module load intel/mpi/64

 mpirun /share/apps/namd/namd2.13/namd2 pc.conf > pc.log
**************************************************************************

The output where indicated "1" cpu was using:

ENERGY: 0 482.9967 1279.0637 1052.3094 78.2784 -71938.7727 7732.6846 0.0000 0.0000 11869.2504 -49444.1894 298.3828 -61313.4398 -49399.0775 298.3828 79.3838 81.2849 193787.4555 79.3838 81.2849

OPENING EXTENDED SYSTEM TRAJECTORY FILE
LDB: ============= START OF LOAD BALANCING ============== 2.06838
LDB: ============== END OF LOAD BALANCING =============== 2.06856
LDB: =============== DONE WITH MIGRATION ================ 2.06971
LDB: ============= START OF LOAD BALANCING ============== 8.05131
LDB: ============== END OF LOAD BALANCING =============== 8.05246
LDB: =============== DONE WITH MIGRATION ================ 8.05289
Info: Initial time: 1 CPUs 0.150798 s/step 0.872672 days/ns 236.934 MB memory
LDB: ============= START OF LOAD BALANCING ============== 9.55101
LDB: ============== END OF LOAD BALANCING =============== 9.55108
LDB: =============== DONE WITH MIGRATION ================ 9.5515
LDB: ============= START OF LOAD BALANCING ============== 15.3206
LDB: ============== END OF LOAD BALANCING =============== 15.3217
LDB: =============== DONE WITH MIGRATION ================ 15.3221
Info: Initial time: 1 CPUs 0.144713 s/step 0.837461 days/ns 237.516 MB memory
LDB: ============= START OF LOAD BALANCING ============== 22.4754
LDB: ============== END OF LOAD BALANCING =============== 22.4766
LDB: =============== DONE WITH MIGRATION ================ 22.477
Info: Initial time: 1 CPUs 0.143573 s/step 0.830862 days/ns 237.516 MB memory

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:08 CST