Re: Problems running FEP on Lenovo NextScale KNL

From: Brian Radak (bradak_at_anl.gov)
Date: Wed Jul 26 2017 - 10:04:22 CDT

I assume you did not compile your own NAMD on KNL? We've been having
trouble with version 17 of the Intel compiler suite and been falling
back to version 16.

Brian

On 07/26/2017 09:23 AM, Francesco Pietra wrote:
> Hello:
> I am asking for advice on running a FEP protein-ligand (Bound)
> simulation. It runs correctly on my Linux-Intel box with namd12
> multicore, while it halts with namd12 knl on a CINECA-codesigned
> Lenovo NextScale cluster with Intel® Xeon Phi™ product family
> “Knights Landing” alongside with Intel® Xeon® processor E5-2600 v4
> product family.
>
> I tried on a single node by selecting 64 CPUs and 256 MPI processes,
> or only 126 MPI processes. In both cases, while the .err file is
> silent, namd log shows, after updating NAMD interface and
> re-initializing colvars, the error:
>
> =======================================================
> colvars: The final output state file will be "frwd-01_0.colvars.state".
>
>
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
> <http://r065c01s03-hfi.marconi.cineca.it>
> = EXIT CODE: 11
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 58639 RUNNING AT r065c01s03-hfi.marconi.cineca.it
> <http://r065c01s03-hfi.marconi.cineca.it>
> = EXIT CODE: 11
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ====================================================================
> For comparison, on my desktop, at that stage, it continue normally:
> colvars: The final output state file will be "frwd-01_0.colvars.state".
> FEP: RESETTING FOR NEW FEP WINDOW LAMBDA SET TO 0 LAMBDA2 0.02
> FEP: WINDOW TO HAVE 100000 STEPS OF EQUILIBRATION PRIOR TO FEP DATA
> COLLECTION.
> FEP: USING CONSTANT TEMPERATURE OF 300 K FOR FEP CALCULATION
> PRESSURE: 0 70.2699 -221.652 -54.6848 -221.652 -146.982 179.527
> -54.6848 179.527 216.259
> GPRESSURE: 0 92.593 -114.553 110.669 -161.111 -69.3013 92.2703 26.1698
> 176.706 99.3091
> ETITLE: TS BOND ANGLE DIHED
> IMPRP ELECT VDW BOUNDARY MISC
> KINETIC TOTAL TEMP POTENTIAL
> TOTAL3 TEMPAVG PRESSURE GPRESSURE VOLUME
> PRESSAVG GPRESSAVG
> FEPTITLE: TS BOND2 ELECT2 VDW2
>
> ENERGY: 0 4963.7649 7814.6132 8443.0271
> 479.5443 -251991.4214
> ################
> The batch job was configures as follows for 126:
>
> #!/bin/bash
> #PBS -l select=1:ncpus=64:mpiprocs=126:mem=86GB:mcdram=cache:numa=quadrant
> #PBS -l walltime=00:10:00
> #PBS -o frwd-01.out
> #PBS -e frwd-01.err
> #PBS -A my account
>
> # go to submission directory
> cd $PBS_O_WORKDIR
>
> # load namd
> module load profile/knl
> module load autoload namd/2.12_knl
> module help namd/2.12_knl
>
> #launch NAMD over 4*64=256 cores
>
> mpirun -perhost 1 -n 1 namd2 +ppn 126 frwd-01.namd +pemap 4-66+68 +
> commap 67 > frwd-01.namd.log
>
> ########################
> or for 256:
> #PBS -l select=1:ncpus=64:mpiprocs=256:mem=86GB:mcdram=cache:numa=quadrant
>
> mpirun -perhost 1 -n 1 namd2 +ppn 256 frwd-01.namd +pemap
> 0-63+64+128+192 > frwd-01.namd.log
>
> ###############
>
> Assuming that KNL is no hindrance to FEP, i hope to get a hint to
> transmit to operators at the cluster.
>
> Thanks
>
> francesco pietra

-- 
Brian Radak
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240
Argonne, IL 60439-4854
(630) 252-8643
brian.radak_at_anl.gov

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:27 CST