RE: Parallel Jobs

From: Vermaas, Joshua (
Date: Thu Dec 06 2018 - 15:21:18 CST

Yeah. The multicore builds are blissfully ignorant of additional nodes that slurm provides. If you compiled your own mpi-inclusive build of NAMD, mpirun is usually clever enough to pick unallocated slots when running another process. Mike's approach is the one I personally use, with the addition that I have python or tcl scripts that write the slurm input decks.


On 2018-12-06 14:09:05-07:00 wrote:

Everything in that script is likely running on the first host in your reservation. So youre running 10 jobs simultaneously on 1 node, instead of 1 each on 10 nodes. So 120 seconds for 10 jobs isnt unreasonable, since each job is only getting 12 seconds of compute time in that period.

You could reduce your reservation down to 1 node and remove the & characters, make 10 scripts with each script using 1 node and and running one minimization, or do something more sophisticated with job arrays in Slurm.

Mike Renfro  / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University
&gt; On Dec 6, 2018, at 2:54 PM, McGuire, Kelly <> wrote:
&gt; If I use this, it takes 2 minutes to finish the 10 jobs, whereas if only one job is submitted at a time, then it takes 15-20 seconds:
&gt; #!/bin/bash
&gt; #SBATCH -C rhel7
&gt; #SBATCH --time=3-00:00:00   # walltime
&gt; #SBATCH --ntasks-per-node=24   # number of processor cores (i.e. tasks)
&gt; #SBATCH --nodes=10   # number of nodes
&gt; #SBATCH --gres=gpu:4
&gt; #SBATCH --mem=64G   # memory per CPU core
&gt; #SBATCH -w, --nodelist=m9g-2-1
&gt; # Compatibility variables for PBS. Delete if not needed.
&gt; export PBS_NODEFILE=`/fslapps/fslutils/generate_pbs_nodefile`
&gt; export PBS_QUEUE=batch
&gt; export dir=/panfs/
&gt; # Set the max number of threads to use for programs using OpenMP. Should be &lt;= ppn. Does nothing if the program doesn't use OpenMP.
&gt; # Run NAMD
&gt; $dir Win1/Minimization.conf &gt; log/Minimization1.log &amp;
&gt; $dir Win2/Minimization.conf &gt; log/Minimization2.log &amp;
&gt; $dir Win3/Minimization.conf &gt; log/Minimization3.log &amp;
&gt; $dir Win4/Minimization.conf &gt; log/Minimization4.log &amp;
&gt; $dir Win5/Minimization.conf &gt; log/Minimization5.log &amp;
&gt; $dir Win6/Minimization.conf &gt; log/Minimization6.log &amp;
&gt; $dir Win7/Minimization.conf &gt; log/Minimization7.log &amp;
&gt; $dir Win8/Minimization.conf &gt; log/Minimization8.log &amp;
&gt; $dir Win9/Minimization.conf &gt; log/Minimization9.log &amp;
&gt; $dir Win10/Minimization.conf &gt; log/Minimization10.log &amp;
&gt; wait
&gt; Kelly L. McGuire
&gt; PhD Candidate
&gt; Biophysics
&gt; Department of Physiology and Developmental Biology
&gt; Brigham Young University
&gt; LSB 3050
&gt; Provo, UT 84602
&gt; From: Bennion, Brian <>
&gt; Sent: Thursday, December 6, 2018 1:51:38 PM
&gt; To: McGuire, Kelly;
&gt; Subject: Re: namd-l: Parallel Jobs
&gt; Namd has no reource scheduler for clusters. Please share your submit script so that we can see what is being attempted.
&gt; Thanks
&gt; Brian
&gt; ---
&gt; Sent from Workspace ONE Boxer
&gt; On December 6, 2018 at 12:41:55 PM PST, McGuire, Kelly <> wrote:
&gt;&gt;   I just tried, for the first time, submitting 10 minimization jobs from the same bash submit script.  Usually, using the CUDA NAMD version with 4 GPUs, an individual minimization job finishes in 15-20 seconds.  If I try submitting 10 minimization jobs on 10 nodes, with 24 cpus per node, 64 GB per node, and 4 GPUs per node, it takes about 2 minutes to finish the minimization jobs.  It seems that each job is not getting their own node and set of GPUs.  How does NAMD handle parallel jobs like this?
&gt;&gt; Kelly L. McGuire
&gt;&gt; PhD Candidate
&gt;&gt; Biophysics
&gt;&gt; Department of Physiology and Developmental Biology
&gt;&gt; Brigham Young University
&gt;&gt; LSB 3050
&gt;&gt; Provo, UT 84602

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:21:34 CST