Running NAMD - 2.14 - SMP - Verbs - CUDA

From: Bassam Haddad (bhaddad_at_pdx.edu)
Date: Wed Aug 19 2020 - 20:56:19 CDT

*Hello NAMDers,*

*I have been trying to run NAMD-2.14-SMP-Verbs-CUDA on my Uni's HPC, which
uses and Infiniband network, and a SLURM manager. In my best attempts I
could only get it running on a single node (despite calling more). I
started with the following script found in a previous namd-l thread. *

 !/bin/bash
#SBATCH --job-name TEST
#SBATCH --nodes 2
#SBATCH --ntasks-per-node=28
#SBATCH --time=00:10:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:p100:2
#SBATCH --error /home/users/haddad/TEST_Parallel/test.err
#SBATCH --output /home/users/haddad/TEST_Parallel/test.txt

# generate NAMD nodelist
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
 echo "host $n ++cpus 28" >> nodelist.$SLURM_JOBID
done

# calculate total processes (P) and procs per node (PPN)
PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
P="$(($PPN * $SLURM_NNODES))"

module use /home/exacloud/software/modules/
module load openmpi
#module load namd/2.14b2

# Personal binaries

CHARM="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/charmrun"
NAMD="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/namd2"

# System NAMD

$CHARM $NAMD ++p $P ++ppn $PPN ++nodelist nodelist.$SLURM_JOBID ++mpiexec
+setcpuaffinity +isomalloc_sync
/home/users/haddad/TEST_Parallel/TEST_Parallel.namd

*When I run the script it appears to work (2 nodes allocated) however, the
log-file seems to indicate that I am only using a single node. *

Charmrun> scalable start enabled.
Charmrun> IBVERBS version of charmrun
Charmrun> started all node programs in 0.613 seconds.
Charm++> Running in SMP mode: 2 processes, 27 worker threads (PEs) + 1 comm
threads per process, 54 PEs total
Charm++> The comm. thread only receives messages, while work threads send
messages
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID:
v6.10.2-0-g7bf00fa-namd-charm-6.10.2-build-2020-Aug-05-556
Charm++> Synchronizing isomalloc memory region...
Charm++> Consolidated Isomalloc memory region: 0x440000000 - 0x7f4400000000
(133430272 MB).
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
Charm++> Running on 1 hosts (2 sockets x 14 cores x 1 PUs = 28-way SMP)
Charm++> cpu topology info is gathered in 0.021 seconds.

*Furthermore, only 2 GPUs are used, though I would like to use two GPUs per
node. I am hoping there is a simple fix, but I have been struggling with
this for a few days. *

*Thank you!*
________________________
*Bassam Haddad*
Graduate Research Assistant
Portland State University
Portland, OR

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:14 CST