RFC: shell scripts and support functions for NAMD on Slurm (including GPU and Infiniband)

From: Renfro, Michael (Renfro_at_tntech.edu)
Date: Mon Sep 04 2017 - 13:57:54 CDT

Hey, folks. Trying to document NAMD on our new Slurm HPC setup, and spent a lot of time piecing together materials from several sources. Would appreciate a sanity check on the following scripts to simplify NAMD runs for my end users, and if a reviewed and approved version of this this ends up on the Wiki or somewhere more official, so be it.

The core shell script for the user:

  INPUT=bench.in
  OUTPUT=bench.out
  source namd_functions # from /cm/shared/apps/namd
  namd_setup # loads modules and sets up nodelists as needed
  namd_run # runs charmrun or namd2 as needed using ${INPUT} and ${OUTPUT}

The user can adjust their requested resources with any of:

  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=28

for a single-node multicore run,

  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=28
  #SBATCH --partition=gpu
  #SBATCH --gres=gpu:2

for a single-node GPU run,

  #SBATCH --nodes=2
  #SBATCH --ntasks-per-node=28

for a multi-node non-GPU run over Infiniband, or

  #SBATCH --nodes=2
  #SBATCH --ntasks-per-node=28
  #SBATCH --partition=gpu
  #SBATCH --gres=gpu:2

for a multi-node GPU run over Infiniband.

Contents of namd_functions:

  function namd_make_nodelist() {
> nodelist.${SLURM_JOBID}
      for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
          LINE="host ${n}.SOME.DOMAIN ++cpus ${SLURM_CPUS_ON_NODE}"
          echo "${LINE}" >> nodelist.${SLURM_JOBID}
      done
      CHARMRUN_ARGS="++p ${SLURM_NTASKS} ++ppn ${SLURM_CPUS_ON_NODE}"
      CHARMRUN_ARGS="${CHARMRUN_ARGS} ++nodelist nodelist.${SLURM_JOBID}"
  }
  
  function namd_setup() {
      CHARMRUN_ARGS=""
      NAMD_ARGS=""
      if [ "${GPU_DEVICE_ORDINAL}" == "NoDevFiles" ]; then
          # No GPUs reserved
          if [ "${SLURM_NNODES}" -gt 1 ]; then
              # multiple nodes without GPUs
              module load namd/ibverbs-smp
              namd_make_nodelist
              NAMD_ARGS="+setcpuaffinity"
          else
              # single node without GPUs
              module load namd/multicore
              NAMD_ARGS="+p${SLURM_NTASKS}"
          fi
      else
          # GPUs reserved
          module load cuda80/toolkit
          NAMD_ARGS="+setcpuaffinity +devices ${GPU_DEVICE_ORDINAL}"
          if [ "${SLURM_NNODES}" -gt 1 ]; then
              # multiple nodes with GPUs
              module load namd/ibverbs-smp-cuda
              namd_make_nodelist
          else
              # single node with GPUs
              module load namd/cuda
              NAMD_ARGS="${NAMD_ARGS} +p${SLURM_NTASKS}"
          fi
      fi
  }
  
  function namd_run() {
      if [ "${SLURM_NNODES}" -gt 1 ]; then
          charmrun `which namd2` ${CHARMRUN_ARGS} ${NAMD_ARGS} \
              ${INPUT} >& ${OUTPUT}
      else
          namd2 ${NAMD_ARGS} ${INPUT} >& ${OUTPUT}
      fi
  }

-- 
Mike Renfro  / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Technological University

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:34 CST