Re: Running NAMD - 2.14 - SMP - Verbs - CUDA

From: Victor Kwan (vkwan8_at_uwo.ca)
Date: Sun Aug 23 2020 - 10:42:05 CDT

Your first point of contact should always be your HPC administrators.

Without any knowledge about what interconnect/how mpi libraries were set up
on your HPC system, here is my educated guess:

You should not use openmpi or ++mpiexec when launching any verbs/ibverbs
build

On Thu, Aug 20, 2020 at 5:57 PM Bassam Haddad <bhaddad_at_pdx.edu> wrote:

> After working on this more today I think I may have found the reason,
> without an understanding of how to fix it.
> I added the flag `++numHosts 2`, I get the following error:
> Charmrun> Error: ++numHosts exceeds available host pool.
>
> I cannot see what is wrong with my nodelist format.
>
>
> */home/users/haddad/TEST_Parallel/nodelist.13643631:*group main
> host exanode-7-22
> host exanode-7-23
>
>
> */home/users/haddad/TEST_Parallel/sub_TEST_Parallel_2.0.sh
> <http://sub_TEST_Parallel_2.0.sh>:*#!/bin/bash
> #SBATCH --job-name TEST
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=28
> #SBATCH --time=00:10:00
> #SBATCH --partition=gpu
> #SBATCH --gres=gpu:p100:2
> #SBATCH --error /home/users/haddad/TEST_Parallel/test.err
> #SBATCH --output /home/users/haddad/TEST_Parallel/test.txt
>
> # generate NAMD nodelist
> echo "group main" >> nodelist.$SLURM_JOBID
> for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
> echo "host $n" >> nodelist.$SLURM_JOBID
> done
>
> # calculate total processes (P) and procs per node (PPN)
> PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
> P="$(($PPN * $SLURM_NNODES))"
>
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64
> module use /home/exacloud/software/modules/
> module load openmpi
> #module load namd/2.14b2
>
> # Personal binaries
>
> CHARM="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/charmrun"
> NAMD="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/namd2"
>
> CHARMRUN_ARGS="++p $P ++ppn $PPN ++nodegroup main ++nodelist
> /home/users/haddad/TEST_Parallel/nodelist.$SLURM_JOBID ++numHosts 2
> ++mpiexec ++verbose"
> NAMD_ARGS="+setcpuaffinity +isomalloc_sync"
> $CHARM $NAMD $CHARMRUN_ARGS $NAMD_ARGS
> /home/users/haddad/TEST_Parallel/TEST_Parallel.namd
>
> On Wed, Aug 19, 2020 at 7:04 PM Bassam Haddad <bhaddad_at_pdx.edu> wrote:
>
>> *Hello NAMDers,*
>>
>> *I have been trying to run NAMD-2.14-SMP-Verbs-CUDA on my Uni's HPC,
>> which uses and Infiniband network, and a SLURM manager. In my best attempts
>> I could only get it running on a single node (despite calling more). I
>> started with the following script found in a previous namd-l thread. *
>>
>>
>>
>> !/bin/bash
>> #SBATCH --job-name TEST
>> #SBATCH --nodes 2
>> #SBATCH --ntasks-per-node=28
>> #SBATCH --time=00:10:00
>> #SBATCH --partition=gpu
>> #SBATCH --gres=gpu:p100:2
>> #SBATCH --error /home/users/haddad/TEST_Parallel/test.err
>> #SBATCH --output /home/users/haddad/TEST_Parallel/test.txt
>>
>> # generate NAMD nodelist
>> for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
>> echo "host $n ++cpus 28" >> nodelist.$SLURM_JOBID
>> done
>>
>> # calculate total processes (P) and procs per node (PPN)
>> PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
>> P="$(($PPN * $SLURM_NNODES))"
>>
>> module use /home/exacloud/software/modules/
>> module load openmpi
>> #module load namd/2.14b2
>>
>> # Personal binaries
>>
>> CHARM="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/charmrun"
>> NAMD="/home/users/haddad/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA/namd2"
>>
>> # System NAMD
>>
>> $CHARM $NAMD ++p $P ++ppn $PPN ++nodelist nodelist.$SLURM_JOBID ++mpiexec
>> +setcpuaffinity +isomalloc_sync
>> /home/users/haddad/TEST_Parallel/TEST_Parallel.namd
>>
>> *When I run the script it appears to work (2 nodes allocated) however,
>> the log-file seems to indicate that I am only using a single node. *
>>
>> Charmrun> scalable start enabled.
>> Charmrun> IBVERBS version of charmrun
>> Charmrun> started all node programs in 0.613 seconds.
>> Charm++> Running in SMP mode: 2 processes, 27 worker threads (PEs) + 1
>> comm threads per process, 54 PEs total
>> Charm++> The comm. thread only receives messages, while work threads send
>> messages
>> Charm++> Using recursive bisection (scheme 3) for topology aware
>> partitions
>> Converse/Charm++ Commit ID:
>> v6.10.2-0-g7bf00fa-namd-charm-6.10.2-build-2020-Aug-05-556
>> Charm++> Synchronizing isomalloc memory region...
>> Charm++> Consolidated Isomalloc memory region: 0x440000000 -
>> 0x7f4400000000 (133430272 MB).
>> Charm++> scheduler running in netpoll mode.
>> CharmLB> Load balancer assumes all CPUs are same.
>> Charm++> cpu affinity enabled.
>> Charm++> Running on 1 hosts (2 sockets x 14 cores x 1 PUs = 28-way SMP)
>> Charm++> cpu topology info is gathered in 0.021 seconds.
>>
>> *Furthermore, only 2 GPUs are used, though I would like to use two GPUs
>> per node. I am hoping there is a simple fix, but I have been struggling
>> with this for a few days. *
>>
>> *Thank you!*
>> ________________________
>> *Bassam Haddad*
>> Graduate Research Assistant
>> Portland State University
>> Portland, OR
>>
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:09 CST