Re: Parallel Jobs

From: Bennion, Brian (bennion1_at_llnl.gov)
Date: Thu Dec 06 2018 - 15:01:48 CST

You have not told slurm which nodes are dedicated to which namd job. My guess is that you are running all the jobs on the first allocated node.

Srun needs to be used with the -r argument based on my experiences

Brian


---
Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer>

On December 6, 2018 at 12:54:59 PM PST, McGuire, Kelly <mcg05004_at_byui.edu> wrote:

If I use this, it takes 2 minutes to finish the 10 jobs, whereas if only one job is submitted at a time, then it takes 15-20 seconds:

#!/bin/bash

#SBATCH -C rhel7
#SBATCH --time=3-00:00:00 # walltime
#SBATCH --ntasks-per-node=24 # number of processor cores (i.e. tasks)
#SBATCH --nodes=10 # number of nodes
#SBATCH --gres=gpu:4
#SBATCH --mem=64G # memory per CPU core
#SBATCH -w, --nodelist=m9g-2-1

# Compatibility variables for PBS. Delete if not needed.
export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile`
export PBS_JOBID=$SLURM_JOB_ID
export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR"
export PBS_QUEUE=batch

export dir=/panfs/pan.fsl.byu.edu/scr/grp/busathlab/software/namd/exec/NAMD_2.13b2_Linux-x86_64-multicore-CUDA/namd2


# Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP.
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE

# Run NAMD
$dir Win1/Minimization.conf > log/Minimization1.log &
$dir Win2/Minimization.conf > log/Minimization2.log &
$dir Win3/Minimization.conf > log/Minimization3.log &
$dir Win4/Minimization.conf > log/Minimization4.log &
$dir Win5/Minimization.conf > log/Minimization5.log &
$dir Win6/Minimization.conf > log/Minimization6.log &
$dir Win7/Minimization.conf > log/Minimization7.log &
$dir Win8/Minimization.conf > log/Minimization8.log &
$dir Win9/Minimization.conf > log/Minimization9.log &
$dir Win10/Minimization.conf > log/Minimization10.log &
wait



Kelly L. McGuire

PhD Candidate

Biophysics

Department of Physiology and Developmental Biology

Brigham Young University

LSB 3050

Provo, UT 84602


________________________________
From: Bennion, Brian <bennion1_at_llnl.gov>
Sent: Thursday, December 6, 2018 1:51:38 PM
To: McGuire, Kelly; namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: Parallel Jobs

Namd has no reource scheduler for clusters. Please share your submit script so that we can see what is being attempted.
Thanks
Brian


---
Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer>

On December 6, 2018 at 12:41:55 PM PST, McGuire, Kelly <mcg05004_at_byui.edu> wrote:

  I just tried, for the first time, submitting 10 minimization jobs from the same bash submit script. Usually, using the CUDA NAMD version with 4 GPUs, an individual minimization job finishes in 15-20 seconds. If I try submitting 10 minimization jobs on 10 nodes, with 24 cpus per node, 64 GB per node, and 4 GPUs per node, it takes about 2 minutes to finish the minimization jobs. It seems that each job is not getting their own node and set of GPUs. How does NAMD handle parallel jobs like this?


Kelly L. McGuire

PhD Candidate

Biophysics

Department of Physiology and Developmental Biology

Brigham Young University

LSB 3050

Provo, UT 84602

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:21:34 CST