| version 1.119 | version 1.120 |
|---|
| |
| The newer verbs network layer should offer equivalent performance to | The newer verbs network layer should offer equivalent performance to |
| the ibverbs layer, plus support for multi-copy algorithms (replicas). | the ibverbs layer, plus support for multi-copy algorithms (replicas). |
| | |
| | Intel Omni-Path networks are incompatible with the pre-built ibverbs |
| | NAMD binaries. Charm++ for verbs can be built with --with-qlogic |
| | to support Omni-Path, but the Charm++ MPI network layer performs |
| | better than the verbs layer. Hangs have been observed with Intel MPI |
| | but not with OpenMPI, so OpenMPI is preferred. See "Compiling NAMD" |
| | below for MPI build instructions. NAMD MPI binaries may be launched |
| | directly with mpiexec rather than via the provided charmrun script. |
| | |
| Writing batch job scripts to run charmrun in a queueing system can be | Writing batch job scripts to run charmrun in a queueing system can be |
| challenging. Since most clusters provide directions for using mpiexec | challenging. Since most clusters provide directions for using mpiexec |
| to launch MPI jobs, charmrun provides a ++mpiexec option to use mpiexec | to launch MPI jobs, charmrun provides a ++mpiexec option to use mpiexec |
| |
| | |
| All Cray XE/XK/XC network layers support multi-copy algorithms (replicas). | All Cray XE/XK/XC network layers support multi-copy algorithms (replicas). |
| | |
| | -- Xeon Phi Processors (KNL) -- |
| | |
| | Special Linux-KNL-icc and CRAY-XC-KNL-intel builds enable vectorizable |
| | mixed-precision kernels while preserving full alchemical and other |
| | functionality. Multi-host runs require multiple smp processes per host |
| | (as many as 13 for Intel Omni-Path, 6 for Cray Aries) in order to drive |
| | the network. Careful attention to CPU affinity settings (see below) is |
| | required, as is 1 or 2 (but not 3 or 4) hyperthreads per PE core (but |
| | only 1 per communication thread core). |
| | |
| | There appears to be a bug in the Intel 17.0 compiler that breaks the |
| | non-KNL-optimized NAMD kernels (used for alchemical free energy, etc.) |
| | on KNL. Therefore the Intel 16.0 compilers are recommended on KNL. |
| | |
| -- SGI Altix UV -- | -- SGI Altix UV -- |
| | |
| Use Linux-x86_64-multicore and the following script to set CPU affinity: | Use Linux-x86_64-multicore and the following script to set CPU affinity: |
| |
| cores 0,1,4,5,8,9,... or 0-127:4.2. Running 4 processes with +ppn 31 | cores 0,1,4,5,8,9,... or 0-127:4.2. Running 4 processes with +ppn 31 |
| would be "+setcpuaffinity +pemap 0-127:32.31 +commap 31-127:32" | would be "+setcpuaffinity +pemap 0-127:32.31 +commap 31-127:32" |
| | |
| | For Intel processors, including KNL, where hyperthreads on the same core |
| | are not numbered consecutively, hyperthreads may be mapped to consecutive |
| | PEs by appending [+span] to a core set, e.g., "+pemap 0-63+64+128+192" |
| | to use all threads on a 64-core, 256-thread KNL with threads mapped to |
| | PEs as 0,64,128,192,1,65,129,193,... |
| | |
| For an Altix UV or other machines where the queueing system assigns cores | For an Altix UV or other machines where the queueing system assigns cores |
| to jobs this information must be obtained with numactl --show and passed | to jobs this information must be obtained with numactl --show and passed |
| to NAMD in order to set thread affinity (which will improve performance): | to NAMD in order to set thread affinity (which will improve performance): |