Re: Parallel Jobs

From: McGuire, Kelly (mcg05004_at_byui.edu)
Date: Thu Dec 06 2018 - 14:54:30 CST

If I use this, it takes 2 minutes to finish the 10 jobs, whereas if only one job is submitted at a time, then it takes 15-20 seconds:

#!/bin/bash

#SBATCH -C rhel7
#SBATCH --time=3-00:00:00 # walltime
#SBATCH --ntasks-per-node=24 # number of processor cores (i.e. tasks)
#SBATCH --nodes=10 # number of nodes
#SBATCH --gres=gpu:4
#SBATCH --mem=64G # memory per CPU core
#SBATCH -w, --nodelist=m9g-2-1

# Compatibility variables for PBS. Delete if not needed.
export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile`
export PBS_JOBID=$SLURM_JOB_ID
export PBS_O_WORKDIR="$SLURM_SUBMIT_DIR"
export PBS_QUEUE=batch

export dir=/panfs/pan.fsl.byu.edu/scr/grp/busathlab/software/namd/exec/NAMD_2.13b2_Linux-x86_64-multicore-CUDA/namd2

# Set the max number of threads to use for programs using OpenMP. Should be <= ppn. Does nothing if the program doesn't use OpenMP.
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE

# Run NAMD
$dir Win1/Minimization.conf > log/Minimization1.log &
$dir Win2/Minimization.conf > log/Minimization2.log &
$dir Win3/Minimization.conf > log/Minimization3.log &
$dir Win4/Minimization.conf > log/Minimization4.log &
$dir Win5/Minimization.conf > log/Minimization5.log &
$dir Win6/Minimization.conf > log/Minimization6.log &
$dir Win7/Minimization.conf > log/Minimization7.log &
$dir Win8/Minimization.conf > log/Minimization8.log &
$dir Win9/Minimization.conf > log/Minimization9.log &
$dir Win10/Minimization.conf > log/Minimization10.log &
wait

Kelly L. McGuire

PhD Candidate

Biophysics

Department of Physiology and Developmental Biology

Brigham Young University

LSB 3050

Provo, UT 84602

________________________________
From: Bennion, Brian <bennion1_at_llnl.gov>
Sent: Thursday, December 6, 2018 1:51:38 PM
To: McGuire, Kelly; namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: Parallel Jobs

Namd has no reource scheduler for clusters. Please share your submit script so that we can see what is being attempted.
Thanks
Brian

---
Sent from Workspace ONE Boxer<https://whatisworkspaceone.com/boxer>
On December 6, 2018 at 12:41:55 PM PST, McGuire, Kelly <mcg05004_at_byui.edu> wrote:
  I just tried, for the first time, submitting 10 minimization jobs from the same bash submit script.  Usually, using the CUDA NAMD version with 4 GPUs, an individual minimization job finishes in 15-20 seconds.  If I try submitting 10 minimization jobs on 10 nodes, with 24 cpus per node, 64 GB per node, and 4 GPUs per node, it takes about 2 minutes to finish the minimization jobs.  It seems that each job is not getting their own node and set of GPUs.  How does NAMD handle parallel jobs like this?
Kelly L. McGuire
PhD Candidate
Biophysics
Department of Physiology and Developmental Biology
Brigham Young University
LSB 3050
Provo, UT 84602

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:23 CST