Re: Running NAMD - 2.14 - SMP - Verbs - CUDA

From: Bassam Haddad (bhaddad_at_pdx.edu)
Date: Thu Aug 20 2020 - 16:56:36 CDT

After working on this more today I think I may have found the reason,
without an understanding of how to fix it.
I added the flag `++numHosts 2`, I get the following error:
Charmrun> Error: ++numHosts exceeds available host pool.

I cannot see what is wrong with my nodelist format.

*/home/users/haddad/TEST_Parallel/nodelist.13643631:*group main
host exanode-7-22
host exanode-7-23

*/home/users/haddad/TEST_Parallel/sub_TEST_Parallel_2.0.sh
<http://sub_TEST_Parallel_2.0.sh>:*#!/bin/bash
#SBATCH --job-name TEST
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=28
#SBATCH --time=00:10:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:p100:2
#SBATCH --error /home/users/haddad/TEST_Parallel/test.err
#SBATCH --output /home/users/haddad/TEST_Parallel/test.txt

# generate NAMD nodelist
echo "group main" >> nodelist.$SLURM_JOBID
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
 echo "host $n" >> nodelist.$SLURM_JOBID
done

# calculate total processes (P) and procs per node (PPN)
PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
P="$(($PPN * $SLURM_NNODES))"

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64
module use /home/exacloud/software/modules/
module load openmpi
#module load namd/2.14b2

# Personal binaries

CHARM="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/charmrun"
NAMD="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/namd2"

CHARMRUN_ARGS="++p $P ++ppn $PPN ++nodegroup main ++nodelist
/home/users/haddad/TEST_Parallel/nodelist.$SLURM_JOBID ++numHosts 2
++mpiexec ++verbose"
NAMD_ARGS="+setcpuaffinity +isomalloc_sync"
$CHARM $NAMD $CHARMRUN_ARGS $NAMD_ARGS
/home/users/haddad/TEST_Parallel/TEST_Parallel.namd

On Wed, Aug 19, 2020 at 7:04 PM Bassam Haddad <bhaddad_at_pdx.edu> wrote:

> *Hello NAMDers,*
>
> *I have been trying to run NAMD-2.14-SMP-Verbs-CUDA on my Uni's HPC, which
> uses and Infiniband network, and a SLURM manager. In my best attempts I
> could only get it running on a single node (despite calling more). I
> started with the following script found in a previous namd-l thread. *
>
>
>
> !/bin/bash
> #SBATCH --job-name TEST
> #SBATCH --nodes 2
> #SBATCH --ntasks-per-node=28
> #SBATCH --time=00:10:00
> #SBATCH --partition=gpu
> #SBATCH --gres=gpu:p100:2
> #SBATCH --error /home/users/haddad/TEST_Parallel/test.err
> #SBATCH --output /home/users/haddad/TEST_Parallel/test.txt
>
> # generate NAMD nodelist
> for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
> echo "host $n ++cpus 28" >> nodelist.$SLURM_JOBID
> done
>
> # calculate total processes (P) and procs per node (PPN)
> PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
> P="$(($PPN * $SLURM_NNODES))"
>
> module use /home/exacloud/software/modules/
> module load openmpi
> #module load namd/2.14b2
>
> # Personal binaries
>
> CHARM="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/charmrun"
> NAMD="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/namd2"
>
> # System NAMD
>
> $CHARM $NAMD ++p $P ++ppn $PPN ++nodelist nodelist.$SLURM_JOBID ++mpiexec
> +setcpuaffinity +isomalloc_sync
> /home/users/haddad/TEST_Parallel/TEST_Parallel.namd
>
> *When I run the script it appears to work (2 nodes allocated) however, the
> log-file seems to indicate that I am only using a single node. *
>
> Charmrun> scalable start enabled.
> Charmrun> IBVERBS version of charmrun
> Charmrun> started all node programs in 0.613 seconds.
> Charm++> Running in SMP mode: 2 processes, 27 worker threads (PEs) + 1
> comm threads per process, 54 PEs total
> Charm++> The comm. thread only receives messages, while work threads send
> messages
> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
> Converse/Charm++ Commit ID:
> v6.10.2-0-g7bf00fa-namd-charm-6.10.2-build-2020-Aug-05-556
> Charm++> Synchronizing isomalloc memory region...
> Charm++> Consolidated Isomalloc memory region: 0x440000000 -
> 0x7f4400000000 (133430272 MB).
> Charm++> scheduler running in netpoll mode.
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> cpu affinity enabled.
> Charm++> Running on 1 hosts (2 sockets x 14 cores x 1 PUs = 28-way SMP)
> Charm++> cpu topology info is gathered in 0.021 seconds.
>
> *Furthermore, only 2 GPUs are used, though I would like to use two GPUs
> per node. I am hoping there is a simple fix, but I have been struggling
> with this for a few days. *
>
> *Thank you!*
> ________________________
> *Bassam Haddad*
> Graduate Research Assistant
> Portland State University
> Portland, OR
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:09 CST