Re: Are KNL a good choice for classical MD?

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sun Aug 06 2017 - 03:04:31 CDT

Hi Joshua

mpirun -n 16 .../namd +ppn 4 configfile > logfile

proved to be nearly twice faster (24 days/ns in a RAMD simulation) than

mpirun -perhost 1 -n 1 namd2 +ppn 256 npt-03.conf +pemap 0-63+64+128+192 >
npt-03.log
(40 days/ns in the same RAMD simulation)

I had that perhost input from the cluster staff.

I was wrong about SMP/multicore, it is a singlenode compilation, as you said

thanks

francesco

On Sat, Aug 5, 2017 at 9:42 AM, Francesco Pietra <chiendarret_at_gmail.com>
wrote:

> Actually, from the namd log:
>
> Info: NAMD CVS-2017-01-19 for Linux-KNL-MPI-smp
>
> I was wrong about multinode/smp
>
> francesco
>
>
> ---------- Forwarded message ----------
> From: Francesco Pietra <chiendarret_at_gmail.com>
> Date: Sat, Aug 5, 2017 at 9:02 AM
> Subject: Re: namd-l: Are KNL a good choice for classical MD?
> To: NAMD <namd-l_at_ks.uiuc.edu>, "Vermaas, Joshua" <Joshua.Vermaas_at_nrel.gov>
>
>
> Hi Josh:
> the hardware is
> Model: Lenovo Adam Pass, Racks: 50, Nodes: 3.600, Processors: 1 x
> 68-cores Intel Xeon Phi 7250 CPU (Knights Landing) at 1.40 GHz, Cores: 68
> cores/node (272 with HyperThreading), 244.800 cores in total,RAM: 16
> GB/node of MCDRAM and 96 GB/node of DDR
>
> During August Italy is hibernating, so that I can't have much help
> locally during the month. In particular, as I already posted elsewhere, the
> mentioned KNL node does not work for FEP with the Intel17 compilation (I
> had to, hopefully temporarily, abandon a FEP project). It gives a non-namd
> error
>
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
>> = EXIT CODE: 11
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
>> = EXIT CODE: 11
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
>
> A suggestion was to compile with Intel16 instead of 17, but this can't be
> done as it will introduce other major problems with this cluster. Another
> particularly useful suggestion was to launch namd from the node with the
> knl binary provided by namd itself. However, the access policy prevents me
> accessing the compute node, and the launch goes to ordinary cpus login
> nodes, giving an "illegal" error.
>
> As to SMP vs multinode compilation, nothing can be done now for what I
> said. Probably people had problems and turned temporarily to a single node
> (which is too slow for MD with my system)
>
> Thanks
> francesco
>
>
> On Fri, Aug 4, 2017 at 11:34 PM, Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
> wrote:
>
>> Hi Francesco,
>>
>> This isn't particularly crazy. If you look at the specs, a GTX 680 is
>> rated at around 3000 GFLOPS, as is a single Phi chip (wikipedia is great
>> for these comparisons). In fact, if you had a beefier linux box that could
>> saturate the GPUs, in theory you should get ~2x the performance out of the
>> GPUs since the CPUs are probably rate limiting in your case. That being
>> said, you can probably also get better performance from the KNL nodes. KNL
>> nodes have a ton of different options that can radically change
>> performance, and the ones you have chosen are unfamiliar to me. On
>> Stampede2 or Cori (two KNL machines I've played with), the optimum setup
>> was something like this:
>>
>> srun -n nodecount*13 /Path/to/NAMD/namd2 +ppn 4 +pemap 0-51 +commap 53-65
>> namdconfigfile > namdlogfile
>>
>> This is hardware dependent. Some KNL nodes have 64 real cores, others 68
>> (that is what I see on Stampede/Cori), and still others 72. I'm guessing
>> you have 64 real cores and 256 threads, so if you were looking at single
>> node performance, you may want something like this:
>>
>> mpirun -n 16 .../namd +ppn 4 configfile > logfile
>>
>> Or
>>
>> mpirun -n 12 .../namd +ppn 4 +pemap 0-47 +commap 48-63
>>
>> You also shouldn't feel constrained by the ppn 4 argument. Make it 8! or
>> 2! or anything else and see how the behavior changes (so long as you stick
>> to your 64 core limit) overall. In short, you can probably do a little
>> better than 2 ns/day, although based on the size of your system you will
>> probably want a SMP build that specifically knows about the KNL
>> architecture rather than using the multicore build.
>>
>> -Josh
>>
>> On 08/04/2017 02:59 PM, Francesco Pietra wrote:
>> Hello:
>> I am carrying out classical all atom MD on a rather large protein system
>> (3,300,000 atoms). The speed on a NextScale single node (namd1.12
>> multicore), by choosing
>>
>> mpirun -perhost 1 -n 1 namd2 +ppn 256 npt-03.conf +pemap 0-63+64+128+192
>> > npt-03.log
>>
>> is of 0.5days/ns at ts=1.0fs, rigidbonds water.
>>
>> This compares with 1.2days/ns, under the same conditions, for a Linux
>> box, namd1.12 multicore CUDA, with 6cores, 12 hyperthreads, and two GTX-680.
>>
>> I expected much more from the cluster. Is our cluster unusually slow for
>> some reasons (compiled mpi with Intel 2017) or are GPUs still the choice
>> for classical MD?
>>
>> Thanks for advice
>>
>> francesco pietra
>>
>>
>>
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:29 CST