Fwd: Are KNL a good choice for classical MD?

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Aug 05 2017 - 02:42:31 CDT

Actually, from the namd log:

Info: NAMD CVS-2017-01-19 for Linux-KNL-MPI-smp

I was wrong about multinode/smp

francesco

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Sat, Aug 5, 2017 at 9:02 AM
Subject: Re: namd-l: Are KNL a good choice for classical MD?
To: NAMD <namd-l_at_ks.uiuc.edu>, "Vermaas, Joshua" <Joshua.Vermaas_at_nrel.gov>

Hi Josh:
the hardware is
Model: Lenovo Adam Pass, Racks: 50, Nodes: 3.600, Processors: 1 x 68-cores
Intel Xeon Phi 7250 CPU (Knights Landing) at 1.40 GHz, Cores: 68 cores/node
(272 with HyperThreading), 244.800 cores in total,RAM: 16 GB/node of
MCDRAM and 96 GB/node of DDR

During August Italy is hibernating, so that I can't have much help locally
during the month. In particular, as I already posted elsewhere, the
mentioned KNL node does not work for FEP with the Intel17 compilation (I
had to, hopefully temporarily, abandon a FEP project). It gives a non-namd
error

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
> = EXIT CODE: 11
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
> = EXIT CODE: 11
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

A suggestion was to compile with Intel16 instead of 17, but this can't be
done as it will introduce other major problems with this cluster. Another
particularly useful suggestion was to launch namd from the node with the
knl binary provided by namd itself. However, the access policy prevents me
accessing the compute node, and the launch goes to ordinary cpus login
nodes, giving an "illegal" error.

As to SMP vs multinode compilation, nothing can be done now for what I
said. Probably people had problems and turned temporarily to a single node
(which is too slow for MD with my system)

Thanks
francesco

On Fri, Aug 4, 2017 at 11:34 PM, Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
wrote:

> Hi Francesco,
>
> This isn't particularly crazy. If you look at the specs, a GTX 680 is
> rated at around 3000 GFLOPS, as is a single Phi chip (wikipedia is great
> for these comparisons). In fact, if you had a beefier linux box that could
> saturate the GPUs, in theory you should get ~2x the performance out of the
> GPUs since the CPUs are probably rate limiting in your case. That being
> said, you can probably also get better performance from the KNL nodes. KNL
> nodes have a ton of different options that can radically change
> performance, and the ones you have chosen are unfamiliar to me. On
> Stampede2 or Cori (two KNL machines I've played with), the optimum setup
> was something like this:
>
> srun -n nodecount*13 /Path/to/NAMD/namd2 +ppn 4 +pemap 0-51 +commap 53-65
> namdconfigfile > namdlogfile
>
> This is hardware dependent. Some KNL nodes have 64 real cores, others 68
> (that is what I see on Stampede/Cori), and still others 72. I'm guessing
> you have 64 real cores and 256 threads, so if you were looking at single
> node performance, you may want something like this:
>
> mpirun -n 16 .../namd +ppn 4 configfile > logfile
>
> Or
>
> mpirun -n 12 .../namd +ppn 4 +pemap 0-47 +commap 48-63
>
> You also shouldn't feel constrained by the ppn 4 argument. Make it 8! or
> 2! or anything else and see how the behavior changes (so long as you stick
> to your 64 core limit) overall. In short, you can probably do a little
> better than 2 ns/day, although based on the size of your system you will
> probably want a SMP build that specifically knows about the KNL
> architecture rather than using the multicore build.
>
> -Josh
>
> On 08/04/2017 02:59 PM, Francesco Pietra wrote:
> Hello:
> I am carrying out classical all atom MD on a rather large protein system
> (3,300,000 atoms). The speed on a NextScale single node (namd1.12
> multicore), by choosing
>
> mpirun -perhost 1 -n 1 namd2 +ppn 256 npt-03.conf +pemap 0-63+64+128+192 >
> npt-03.log
>
> is of 0.5days/ns at ts=1.0fs, rigidbonds water.
>
> This compares with 1.2days/ns, under the same conditions, for a Linux box,
> namd1.12 multicore CUDA, with 6cores, 12 hyperthreads, and two GTX-680.
>
> I expected much more from the cluster. Is our cluster unusually slow for
> some reasons (compiled mpi with Intel 2017) or are GPUs still the choice
> for classical MD?
>
> Thanks for advice
>
> francesco pietra
>
>
>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:32 CST