Re: About the performance of RAMD

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Nov 11 2017 - 00:46:49 CST

Hi Giacomo:

You most likely had good performance on a hybrid CPU-GPU setup
>

This is also my experience, as I reported. However, the our new nextscale
cluster has no GPUs, nor there is any plan to have them.

Hi Vlad:

> We have been running RAMD with systems of about 120000 atoms and did not
> see almost any difference comparing to MD (at least not noticeable) on up
> to 128 cores (namd earlier than 2.11)
>

My 300000 is not so drastically different from your 120000, so that, if you
r cluster had CPUs only, there must be a negative bias in my input files.
Or the complexity of my system causes an extra overload: my protein is made
of 24 subunits, which differ in the active site, and thus the request of
ligands. Could you have a look at my input files (namd.conf and namd.job)?
I could send them directly to you not to overload namd forum.

thanks

francesco

On Fri, Nov 10, 2017 at 8:38 PM, <vlad.cojocaru_at_mpi-muenster.mpg.de> wrote:

> We have been running RAMD with systems of about 120000 atoms and did not
> see almost any difference comparing to MD (at least not noticeable) on up
> to 128 cores (namd earlier than 2.11) ... So, we did not optimize anything
> since the time at which was written .. we also did not consider using
> tclBC...
>
> Now, we are planning to pick up on it for larger systems (> 300k) and I
> might be able to say more later when we have some runs and evaluate the
> code and whether is needed and possible to switch to tclBC ...
>
> Vlad
>
> On November 10, 2017 8:14:40 PM GMT+01:00, Giacomo Fiorin <
> giacomo.fiorin_at_gmail.com> wrote:
>>
>> Using /dev/shm wouldn't help: Tcl operations are not slow because of disk
>> I/O, but because of (1) the intrinsic interpreter overhead, and (2) the use
>> of tclForces, which processes all atoms serially on the first core. The
>> use of the Tcl interpreter also prevents using thread parallelism.
>>
>> You most likely had good performance on a hybrid CPU-GPU setup because
>> most of the computation is on the GPU, leaving the CPU with more cycles to
>> spare for the Tcl interpreter.
>>
>> I don't know if the formalism allows it, but running RAMD through tclBC
>> (executed by all tasks) instead of tclForces (executed by the first task)
>> may help. Copying Vlad here for information.
>>
>> Giacomo
>>
>>
>>
>> On Fri, Nov 10, 2017 at 11:55 AM, Francesco Pietra <chiendarret_at_gmail.com
>> > wrote:
>>
>>> Hello:
>>>
>>> I have been using pretty much RAMD with small proteins, particularly on
>>> a small Linux box with 6 cores and two GPUs. The loss of performance from
>>> MD to RAMD was very limited, with namd2.12 or previous versions.
>>>
>>> With larger systems (ca 300000 atoms) the corresponding loss of
>>> performance is six on the above hardware.
>>>
>>> With clusters based on CPUs, like a nextscale, the loss of performance
>>> with the above system of 300000 atoms is so high as to prevent using small
>>> accelerations. With four nodes (144 cores) the loss of performance is 36.
>>> Four nodes become somewhat slower than two nodes.
>>>
>>> Has any remedy been invented for ramd tcl? Could ramd be run (with
>>> successful performance) on a single node by loading everything to /dev/smh?
>>>
>>> Thanks for advice
>>>
>>> francesco pietra
>>>
>>
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:42 CST