Re: Running Parallel Jobs Simultaneously with MPIRUN

From: Rajan Vatassery (rajan_at_umn.edu)
Date: Thu Jul 19 2012 - 01:59:55 CDT

Hugh,
        Thanks for your reply. It seems like this is the way to get this to
work as you suggested. I put the line:

cat $PBS_NODEFILE > ~/hostlist

in my PBS to generate the hostfile and then parsed this list using a
python program I wrote (included at the end of this email) which was
called in the PBS script with this line:

python hostparse.py

and finally, I was able to call multiple, simultaneous NAMD instances
using these lines:

mpirun -np 8 --hostfile ~/hostlist1 namd2 job1.conf > job1.log &
mpirun -np 8 --hostfile ~/hostlist2 namd2 job2.conf > job2.log &
wait

Thanks again for taking the time to answer my question. I think I owe
you a Surly when you're back in MN.

Full stop,

Rajan

hostparse.py:
---------------------------------------------------------
import sys,os

# open the large hostlist file and read the lines into var str1
filein = open('/home/xe1/vatasser/hostlist','r')
str1 = filein.readlines()

file1 = open('/home/xe1/vatasser/hostlist1','w')
file2 = open('/home/xe1/vatasser/hostlist2','w')

# write the first hostfile
for X in range((len(str1)/2)):
    file1.write(str1[X])

# write the second hostfile
for X in range((len(str1)/2)):
    file2.write(str1[X+len(str1)/2])

filein.close()
file1.close()
file2.close()
--------------------------------------------------------
NOTE: this script is very bare-bones and you will likely have to
re-write portions if you are getting errors.

On Wed, 2012-07-18 at 16:29 -0500, Hugh Heldenbrand wrote:
> Hi Rajan-
>
> I believe I have done something similar to what you are trying to do.
> In order to get around restrictions on the number of submitted jobs to
> the queues that I use and to keep things more organized in general, I
> often tie several mpi-based NAMD simulations together into a single pbs
> script. The key is to generate hostlists that are passed to the
> individual MPI calls.
>
> On the systems that I use, the list of hosts for a particular job can be
> accessed via an environmental variable $PBS_NODEFILE. So, you could add
> the line:
>
> cat $PBS_NODEFILE > ~/hostlist
>
> to print the list of hosts to a file in your home directory and see what
> it says.
>
> The contents of the file might look something like this, if I had
> requested two 8-processor nodes for my job:
>
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
>
> I call a simple perl script at the beginning of my pbs file (which I did
> not write, so I would prefer not to post it for all internetdom) to
> parse the host list and divide the hosts into new files, which I keep in
> my home directory. So I might be creating two files that look like this:
>
> ~/hostlist1:
>
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
> node0047
>
> ~/hostlist2:
>
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
> node0048
>
> All that remains is to tell MPI where to find the hostlists I want to
> use instead of the default $PBS_NODEFILE. So my calls to MPI look
> something like this:
>
> mpirun -np 8 -hostfile ~/hostlist1 namd2 myjob1.conf > myjob1.log &
>
> mpirun -np 8 -hostfile ~/hostlist2 namd2 myjob2.conf > myjob2.log &
>
> wait
>
> That's how I run multiple MPI namd jobs in one pbs script. It should be
> relatively easy to implement your own hostlist parsing program in
> whatever language floats your boat.
>
> If that doesn't work, you could consider running non-MPI NAMD on
> individual nodes using the program pbsdsh
> (http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml). I
> do that using the multicore version of NAMD on a GPU machine that we have.
>
> -Hugh Heldenbrand
> Grad student,
> U of MN
>
> On 07/18/2012 03:47 PM, Axel Kohlmeyer wrote:
> > On Wed, Jul 18, 2012 at 9:28 PM, Rajan Vatassery<rajan_at_umn.edu> wrote:
> >> This is not the question I asked. Please read more carefully. Twice in
> >> the first paragraph I mentioned that I need to use the same PBS script
> >> to submit the first two jobs. I am aware that I will need to use the
> >> wait command to allow the first two jobs to complete. That isn't the
> >> problem. The problem is that I cannot send even one job to background
> >> without getting a failed calculation.
> > ..and i said in my first statement. "this cannot work".
> >
> > that is it. full stop. it cannot work. forget it.
> > do what i suggested as alternative.
> >
> > axel.
> >
> >> thanks,
> >>
> >> Rajan
> >>
> >> On Wed, 2012-07-18 at 18:33 +0200, Axel Kohlmeyer wrote:
> >>> On Wed, Jul 18, 2012 at 5:36 PM, Rajan Vatassery<rajan_at_umn.edu> wrote:
> >>>> Dear List,
> >>>> I have several simulations that require mpirun which I need to run at
> >>>> the same time, using the same PBS script. As an example, I have to run
> >>>> two jobs in parallel with each other (to save time), that will provide
> >>>> some output which a third simulation will use as input:
> >>>>
> >>>> simulation 1 --
> >>>> \
> >>>> -----> simulation 3
> >>>> /
> >>>> simulation 2 --
> >>>>
> >>>> Up to this point, however, I'm not able to get two simulations to run at
> >>>> the same time from the same PBS script. There is no communication
> >>>> between simulation 1 and 2, and simulation 3 requires some data from
> >>>> both of 1 and 2.
> >>>> When I try to run something like this:
> >>>>
> >>>> mpirun -np 8 namd2 myjob1.conf> myjob1.log&
> >>>> mpirun -np 8 namd2 myjob2.conf> myjob2.log&
> >>> this cannot work. first of all, if you background both
> >>> calculations without adding a "wait" command, the
> >>> script will just progress and finish immediately and
> >>> thus the simulation will be killed. or worse.
> >>>
> >>>> the jobs do not produce any NAMD-related output, and instead have this
> >>>> line in the log files:
> >>>>
> >>>> 8 total processes killed (some possibly by mpirun during cleanup)
> >>>>
> >>>> The error file has this entry for each process I'm trying to run:
> >>>>
> >>>> <start error file>
> >>>> mpirun: killing job...
> >>>>
> >>>> --------------------------------------------------------------------------
> >>>> mpirun noticed that process rank 0 with PID 13028 on node cl1n106 exited
> >>>> on signal 0 (Unknown signal 0).
> >>>> --------------------------------------------------------------------------
> >>>> mpirun: clean termination accomplished
> >>>>
> >>>> <end error file>
> >>>>
> >>>> If I submit the two jobs individually to the cluster, they will run
> >>>> without problems.
> >>>> I figured that this might be due to mpirun killing the processes that
> >>>> it thinks are orphaned or zombie processes, so I tried to add a "nohup"
> >>>> before the command. This allowed the two processes to run at the same
> >>> no. this is nonsense. you would defeat the restrictions
> >>> of the batch system resource management, and any
> >>> reasonably skilled sysadmin will be able to squash that.
> >>>
> >>>> time, but they were only using one processor out of the 8 I had
> >>>> allocated (as evidenced by the extremely slow computation).
> >>>> I am using the following libraries: intel/12.1, ompi/intel,
> >>>> intel/11.1.072, namd/2.7-ompi. I noticed there is a conflict between
> >>>> intel/12.1 and intel/11.1.072 but presumably those conflicts should also
> >>>> exist when I submit the two jobs individually without incident.
> >>>> I have already asked the system admins on my cluster (MSI), but I
> >>>> believe that this is a NAMD-related issue. Any help is appreciated.
> >>> no it isn't a NAMD issue.
> >>>
> >>> this can be easily managed using PBS/Torque job dependencies.
> >>> have a close look at the "qsub" manpage. look for the documentation
> >>> of the -W flag. there should be a section about "depend=dependency_list".
> >>> this is the feature you need. you just submit job 1 and job 2 and take
> >>> note of the batch job ids. then you submit job 3, but in addition use
> >>> -W depend=afterok:<jobid1>,afterok:<jobid2>
> >>>
> >>> that will make sure that your job 3 will only launch after job1
> >>> and job2 have successfully completed. bingo!
> >>>
> >>> cheers,
> >>> axel.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Rajan
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> >
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:49 CST