Re: Running Parallel Jobs Simultaneously with MPIRUN

From: Hugh Heldenbrand (helde010_at_umn.edu)
Date: Wed Jul 18 2012 - 16:29:27 CDT

Hi Rajan-

I believe I have done something similar to what you are trying to do.
In order to get around restrictions on the number of submitted jobs to
the queues that I use and to keep things more organized in general, I
often tie several mpi-based NAMD simulations together into a single pbs
script. The key is to generate hostlists that are passed to the
individual MPI calls.

On the systems that I use, the list of hosts for a particular job can be
accessed via an environmental variable $PBS_NODEFILE. So, you could add
the line:

cat $PBS_NODEFILE > ~/hostlist

to print the list of hosts to a file in your home directory and see what
it says.

The contents of the file might look something like this, if I had
requested two 8-processor nodes for my job:

node0047
node0047
node0047
node0047
node0047
node0047
node0047
node0047
node0048
node0048
node0048
node0048
node0048
node0048
node0048
node0048

I call a simple perl script at the beginning of my pbs file (which I did
not write, so I would prefer not to post it for all internetdom) to
parse the host list and divide the hosts into new files, which I keep in
my home directory. So I might be creating two files that look like this:

~/hostlist1:

node0047
node0047
node0047
node0047
node0047
node0047
node0047
node0047

~/hostlist2:

node0048
node0048
node0048
node0048
node0048
node0048
node0048
node0048

All that remains is to tell MPI where to find the hostlists I want to
use instead of the default $PBS_NODEFILE. So my calls to MPI look
something like this:

mpirun -np 8 -hostfile ~/hostlist1 namd2 myjob1.conf > myjob1.log &

mpirun -np 8 -hostfile ~/hostlist2 namd2 myjob2.conf > myjob2.log &

wait

That's how I run multiple MPI namd jobs in one pbs script. It should be
relatively easy to implement your own hostlist parsing program in
whatever language floats your boat.

If that doesn't work, you could consider running non-MPI NAMD on
individual nodes using the program pbsdsh
(http://www.clusterresources.com/torquedocs21/commands/pbsdsh.shtml). I
do that using the multicore version of NAMD on a GPU machine that we have.

-Hugh Heldenbrand
Grad student,
U of MN

On 07/18/2012 03:47 PM, Axel Kohlmeyer wrote:
> On Wed, Jul 18, 2012 at 9:28 PM, Rajan Vatassery<rajan_at_umn.edu> wrote:
>> This is not the question I asked. Please read more carefully. Twice in
>> the first paragraph I mentioned that I need to use the same PBS script
>> to submit the first two jobs. I am aware that I will need to use the
>> wait command to allow the first two jobs to complete. That isn't the
>> problem. The problem is that I cannot send even one job to background
>> without getting a failed calculation.
> ..and i said in my first statement. "this cannot work".
>
> that is it. full stop. it cannot work. forget it.
> do what i suggested as alternative.
>
> axel.
>
>> thanks,
>>
>> Rajan
>>
>> On Wed, 2012-07-18 at 18:33 +0200, Axel Kohlmeyer wrote:
>>> On Wed, Jul 18, 2012 at 5:36 PM, Rajan Vatassery<rajan_at_umn.edu> wrote:
>>>> Dear List,
>>>> I have several simulations that require mpirun which I need to run at
>>>> the same time, using the same PBS script. As an example, I have to run
>>>> two jobs in parallel with each other (to save time), that will provide
>>>> some output which a third simulation will use as input:
>>>>
>>>> simulation 1 --
>>>> \
>>>> -----> simulation 3
>>>> /
>>>> simulation 2 --
>>>>
>>>> Up to this point, however, I'm not able to get two simulations to run at
>>>> the same time from the same PBS script. There is no communication
>>>> between simulation 1 and 2, and simulation 3 requires some data from
>>>> both of 1 and 2.
>>>> When I try to run something like this:
>>>>
>>>> mpirun -np 8 namd2 myjob1.conf> myjob1.log&
>>>> mpirun -np 8 namd2 myjob2.conf> myjob2.log&
>>> this cannot work. first of all, if you background both
>>> calculations without adding a "wait" command, the
>>> script will just progress and finish immediately and
>>> thus the simulation will be killed. or worse.
>>>
>>>> the jobs do not produce any NAMD-related output, and instead have this
>>>> line in the log files:
>>>>
>>>> 8 total processes killed (some possibly by mpirun during cleanup)
>>>>
>>>> The error file has this entry for each process I'm trying to run:
>>>>
>>>> <start error file>
>>>> mpirun: killing job...
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that process rank 0 with PID 13028 on node cl1n106 exited
>>>> on signal 0 (Unknown signal 0).
>>>> --------------------------------------------------------------------------
>>>> mpirun: clean termination accomplished
>>>>
>>>> <end error file>
>>>>
>>>> If I submit the two jobs individually to the cluster, they will run
>>>> without problems.
>>>> I figured that this might be due to mpirun killing the processes that
>>>> it thinks are orphaned or zombie processes, so I tried to add a "nohup"
>>>> before the command. This allowed the two processes to run at the same
>>> no. this is nonsense. you would defeat the restrictions
>>> of the batch system resource management, and any
>>> reasonably skilled sysadmin will be able to squash that.
>>>
>>>> time, but they were only using one processor out of the 8 I had
>>>> allocated (as evidenced by the extremely slow computation).
>>>> I am using the following libraries: intel/12.1, ompi/intel,
>>>> intel/11.1.072, namd/2.7-ompi. I noticed there is a conflict between
>>>> intel/12.1 and intel/11.1.072 but presumably those conflicts should also
>>>> exist when I submit the two jobs individually without incident.
>>>> I have already asked the system admins on my cluster (MSI), but I
>>>> believe that this is a NAMD-related issue. Any help is appreciated.
>>> no it isn't a NAMD issue.
>>>
>>> this can be easily managed using PBS/Torque job dependencies.
>>> have a close look at the "qsub" manpage. look for the documentation
>>> of the -W flag. there should be a section about "depend=dependency_list".
>>> this is the feature you need. you just submit job 1 and job 2 and take
>>> note of the batch job ids. then you submit job 3, but in addition use
>>> -W depend=afterok:<jobid1>,afterok:<jobid2>
>>>
>>> that will make sure that your job 3 will only launch after job1
>>> and job2 have successfully completed. bingo!
>>>
>>> cheers,
>>> axel.
>>>
>>>
>>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Rajan
>>>>
>>>>
>>>>
>>>
>>>
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:49 CST