From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Fri Nov 10 2017 - 13:14:40 CST
Using /dev/shm wouldn't help: Tcl operations are not slow because of disk
I/O, but because of (1) the intrinsic interpreter overhead, and (2) the use
of tclForces, which processes all atoms serially on the first core. The
use of the Tcl interpreter also prevents using thread parallelism.
You most likely had good performance on a hybrid CPU-GPU setup because most
of the computation is on the GPU, leaving the CPU with more cycles to spare
for the Tcl interpreter.
I don't know if the formalism allows it, but running RAMD through tclBC
(executed by all tasks) instead of tclForces (executed by the first task)
may help. Copying Vlad here for information.
On Fri, Nov 10, 2017 at 11:55 AM, Francesco Pietra <chiendarret_at_gmail.com>
> I have been using pretty much RAMD with small proteins, particularly on a
> small Linux box with 6 cores and two GPUs. The loss of performance from MD
> to RAMD was very limited, with namd2.12 or previous versions.
> With larger systems (ca 300000 atoms) the corresponding loss of
> performance is six on the above hardware.
> With clusters based on CPUs, like a nextscale, the loss of performance
> with the above system of 300000 atoms is so high as to prevent using small
> accelerations. With four nodes (144 cores) the loss of performance is 36.
> Four nodes become somewhat slower than two nodes.
> Has any remedy been invented for ramd tcl? Could ramd be run (with
> successful performance) on a single node by loading everything to /dev/smh?
> Thanks for advice
> francesco pietra
-- Giacomo Fiorin Associate Professor of Research, Temple University, Philadelphia, PA Contractor, National Institutes of Health, Bethesda, MD http://goo.gl/Q3TBQU https://github.com/giacomofiorin
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:42 CST