Re: About the performance of RAMD

From: vlad.cojocaru_at_mpi-muenster.mpg.de
Date: Fri Nov 10 2017 - 13:38:17 CST

We have been running RAMD with systems of about 120000 atoms and did not see almost any difference comparing to MD (at least not noticeable) on up to 128 cores (namd earlier than 2.11) ... So, we did not optimize anything since the time at which was written .. we also did not consider using tclBC...

Now, we are planning to pick up on it for larger systems (> 300k) and I might be able to say more later when we have some runs and evaluate the code and whether is needed and possible to switch to tclBC ...

Vlad

On November 10, 2017 8:14:40 PM GMT+01:00, Giacomo Fiorin <giacomo.fiorin_at_gmail.com> wrote:
>Using /dev/shm wouldn't help: Tcl operations are not slow because of
>disk
>I/O, but because of (1) the intrinsic interpreter overhead, and (2) the
>use
>of tclForces, which processes all atoms serially on the first core.
>The
>use of the Tcl interpreter also prevents using thread parallelism.
>
>You most likely had good performance on a hybrid CPU-GPU setup because
>most
>of the computation is on the GPU, leaving the CPU with more cycles to
>spare
>for the Tcl interpreter.
>
>I don't know if the formalism allows it, but running RAMD through tclBC
>(executed by all tasks) instead of tclForces (executed by the first
>task)
>may help. Copying Vlad here for information.
>
>Giacomo
>
>
>
>On Fri, Nov 10, 2017 at 11:55 AM, Francesco Pietra
><chiendarret_at_gmail.com>
>wrote:
>
>> Hello:
>>
>> I have been using pretty much RAMD with small proteins, particularly
>on a
>> small Linux box with 6 cores and two GPUs. The loss of performance
>from MD
>> to RAMD was very limited, with namd2.12 or previous versions.
>>
>> With larger systems (ca 300000 atoms) the corresponding loss of
>> performance is six on the above hardware.
>>
>> With clusters based on CPUs, like a nextscale, the loss of
>performance
>> with the above system of 300000 atoms is so high as to prevent using
>small
>> accelerations. With four nodes (144 cores) the loss of performance is
>36.
>> Four nodes become somewhat slower than two nodes.
>>
>> Has any remedy been invented for ramd tcl? Could ramd be run (with
>> successful performance) on a single node by loading everything to
>/dev/smh?
>>
>> Thanks for advice
>>
>> francesco pietra
>>
>
>
>
>--
>Giacomo Fiorin
>Associate Professor of Research, Temple University, Philadelphia, PA
>Contractor, National Institutes of Health, Bethesda, MD
>http://goo.gl/Q3TBQU
>https://github.com/giacomofiorin

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:46 CST