Re: any experiences with executing parallel NAMD in a NAMD simulation

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Tue Feb 10 2009 - 09:26:02 CST

Hi Yinglong,
in that case I'm not sure why things are running slower. I can tell you
that there's nothing inherent about running namd inside of namd that can
degrade performance, as I have used namd as a toplevel interpreter for
spawning replica exchange jobs in the past. You may want to experiment
with, for example, doing the same thing where both systems are smaller,
and try running with ldbUnloadZero yes. By the way, you told me as a
fraction of runtime how much the minimization takes, but how long is it
in wallclock time? Let us know if your problem persists.

Peter

Yinglong Miao wrote:
> Dear Peter,
>
> I compared the embedded run to a standalone energy minimization of the
> same structure with the same number of processors. It's more than 10
> times slower when checking the NAMD output benchmarking results. I did
> notice energy minimization is slower than MD, normally 1/3 of the
> speed for my system.
>
> Thanks,
> Yinglong
>
> Peter Freddolino wrote:
>> Hi Yinglong,
>> when you say that you have a certain performance expectation for the
>> embedded part, what are you basing that expectation on? What kind of
>> benchmarking have you done? Are you taking into the account the fact
>> that a minimization step will be slower than a normal MD step?
>>
>> Peter
>>
>> Yinglong Miao wrote:
>>
>>> Dear Peter,
>>>
>>> Thanks for your reply!
>>>
>>> Peter Freddolino wrote:
>>>
>>>> Hi Yinglong,
>>>> how long is your embedded run, relative to your overall simulation? Is
>>>> it on the same system? Do you end up running on the same nodes? Have you
>>>> verified that you're running on as many processors as you think you are?
>>>>
>>>>
>>>>
>>> I have been running cycles for my overall simulation. In each cycle,
>>> the embedded run is basically energy-minimization of the system of the
>>> overall simulation excluding water and ions. It accounts for about 1/5
>>> of the computation time with the speed expected. I have double checked
>>> the output: the same nodes are used for the embedded run and the
>>> overall simulation.
>>>
>>>> What you're doing is quite unusual, and depending on the exact
>>>> relationship between your two systems there is probably a better way to
>>>> handle it. Especially if the embedded runs are small relative to the
>>>> full run, please keep in mind that namd has a high startup cost. If your
>>>> embedded run is on a system whose topology doesn't change (and only
>>>> needs coordinates once in awhile) you will probably be better off
>>>> starting two separate namd jobs on disjoint sets of nodes and having
>>>> them communicate with sockets (as is done in the replica exchange
>>>> scripts distributed with namd). This gets around both the startup costs
>>>> and other potential problems with having two copies of namd running (if
>>>> your system is large, for example, you're putting a lot of stress on the
>>>> head node in terms of memory).
>>>>
>>>>
>>>>
>>> As mentioned above, the system input to the embedded run is basically
>>> that for the overall simulation excluding host medium, and they are
>>> different. The systems are pretty large, about half a million atoms
>>> for the core structure and much more for the entire system. I have
>>> been thinking of writing batch scripts to invoke NAMD sequentially,
>>> but that will take a lot of efforts to rewrite my NAMD configuration
>>> scripts to set up the running cycles. It would be great if the
>>> technical issues can be solved to have embedded run executed at a
>>> reasonable speed.
>>>
>>> Thanks,
>>> Yinglong
>>>
>>>> Best,
>>>> Peter
>>>>
>>>> Yinglong Miao wrote:
>>>>
>>>>
>>>>> Dear NAMD developers/users,
>>>>>
>>>>> I have been trying to embed a parallel NAMD run in a bigger NAMD
>>>>> simulation; that is by calling "exec mpirun -np $nprocessors
>>>>> -machinefile $machinelist namd2 ${str}.conf" in a NAMD configuration
>>>>> script to run a different structure and generate proper output for the
>>>>> big simulation. The program runs fine except that the embedded part is
>>>>> executed much more slowly than expected (~1/5 of the speed) and
>>>>> sometimes such error messages pop up: "Max retransmit retries reached
>>>>> (829) for message". I thought it's because the message passing is
>>>>> slowed down and the MPI environment is messed up upon NAMD exit for
>>>>> the embedded run. Is there anybody have experiences with this? How to
>>>>> run the embedded part with normal speed? Any suggestions will be
>>>>> greatly appreciated.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> Yinglong Miao
>>>>> Ph.D. Candidate
>>>>> Center for Cell and Virus Theory
>>>>> Chemistry Department, Indiana University
>>>>> 800 E Kirkwood Ave Room C203A, Bloomington, IN 47405
>>>>> Tel: 1-812-856-0981
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>> Yinglong Miao
>>> Ph.D. Candidate
>>> Center for Cell and Virus Theory
>>> Department of Chemistry, Indiana University
>>> Address: 800 E Kirkwood Ave Room C203A, Bloomington, IN 47405
>>> Phone: 1-812-856-0981 (office) 1-812-272-8196 (cell)
>>>
>>
>>
>
> --
> Yinglong Miao
> Ph.D. Candidate
> Center for Cell and Virus Theory
> Department of Chemistry, Indiana University
> Address: 800 E Kirkwood Ave Room C203A, Bloomington, IN 47405
> Phone: 1-812-856-0981 (office) 1-812-272-8196 (cell)

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:29 CST