NAMD parallelisation / patch grid

From: P.-L. Chau (pc104_at_pasteur.fr)
Date: Tue Dec 27 2016 - 04:37:15 CST

Could I ask for some advice on how to define patch grid in NAMD, please?

My test simulation system contains 398205 atoms and consists of nAChR
embedded in a hydrated membrane. This system has been equilibrated, and I
use the same equilibrated position as the starting point of the test runs.
I perform simulations of, respectively, 1000 and 101000 time-steps. I
subtract one from the other to get the time required for 100000
time-steps.

I have done simulations of this system using 10 nodes (10 GPUs + 160
cores) to 320 nodes, and for the 1000-timestep runs, the total time
required is about 30s. This is almost regardless of the number of
processors. The total time required for the 101000-timestep run is,
respectively:

number of nodes computer time used

           10 3363s

           20 2006s

           40 1654s

           80 1230s, 1157s

          160 1171s, 1210s, 1393s

          320 1708s, 1682s

Some of the runs have been repeated or even repeated again, so you see
more than one values for the computer time used.

I read this webpage:

http://www.ks.uiuc.edu/Research/namd/wiki/?NamdPerformanceTuning

and I would like to optimise the performance further. When I use 160
nodes, I find that the patch grid is 16x15x10, and that it is 2-away,
2-away, 1-away. The time used for the standard job is 1171s.

I thought I would first ask for two more processors to do the
"administrative" work. So I increased the number of nodes to 162, and
used these keywords in NAMD:

      ldbUnloadZero yes
      ldbUnloadOne yes
      noPatchesonOne yes

The job run time is 1218s. Curiously, the patch grid is now 17x16x11,
2-away, 2-away, 1-away.

If this is correct, then the number of cores required is 2992. Each
node has 16 cores, one of which is used for communication and the
other 15 for calculations. This means 199 nodes are required for 2992
cores. Put in two more utlilties nodes, and that means 201 nodes. I
have run such a job, and curiously enough, the CPU time used is
1570s, considerably more than 160 nodes. Bizarrely, the patch grid
is now 17x16x22, 2-away, 2-away, 2-away.

We know there are variations in the run time, but 1570s is considerably
longer than the other run times. So the system is NOT optimised this way.

What have I done wrong? Thank you very much, and a Happy New Year to you
all!

P-L Chau
Institut Pasteur, Paris

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:22:47 CST