From: Vermaas, Joshua (Joshua.Vermaas_at_nrel.gov)
Date: Fri Aug 04 2017 - 16:34:28 CDT
This isn't particularly crazy. If you look at the specs, a GTX 680 is rated at around 3000 GFLOPS, as is a single Phi chip (wikipedia is great for these comparisons). In fact, if you had a beefier linux box that could saturate the GPUs, in theory you should get ~2x the performance out of the GPUs since the CPUs are probably rate limiting in your case. That being said, you can probably also get better performance from the KNL nodes. KNL nodes have a ton of different options that can radically change performance, and the ones you have chosen are unfamiliar to me. On Stampede2 or Cori (two KNL machines I've played with), the optimum setup was something like this:
srun -n nodecount*13 /Path/to/NAMD/namd2 +ppn 4 +pemap 0-51 +commap 53-65 namdconfigfile > namdlogfile
This is hardware dependent. Some KNL nodes have 64 real cores, others 68 (that is what I see on Stampede/Cori), and still others 72. I'm guessing you have 64 real cores and 256 threads, so if you were looking at single node performance, you may want something like this:
mpirun -n 16 .../namd +ppn 4 configfile > logfile
mpirun -n 12 .../namd +ppn 4 +pemap 0-47 +commap 48-63
You also shouldn't feel constrained by the ppn 4 argument. Make it 8! or 2! or anything else and see how the behavior changes (so long as you stick to your 64 core limit) overall. In short, you can probably do a little better than 2 ns/day, although based on the size of your system you will probably want a SMP build that specifically knows about the KNL architecture rather than using the multicore build.
On 08/04/2017 02:59 PM, Francesco Pietra wrote:
I am carrying out classical all atom MD on a rather large protein system (3,300,000 atoms). The speed on a NextScale single node (namd1.12 multicore), by choosing
mpirun -perhost 1 -n 1 namd2 +ppn 256 npt-03.conf +pemap 0-63+64+128+192 > npt-03.log
is of 0.5days/ns at ts=1.0fs, rigidbonds water.
This compares with 1.2days/ns, under the same conditions, for a Linux box, namd1.12 multicore CUDA, with 6cores, 12 hyperthreads, and two GTX-680.
I expected much more from the cluster. Is our cluster unusually slow for some reasons (compiled mpi with Intel 2017) or are GPUs still the choice for classical MD?
Thanks for advice
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:29 CST