version 1.32 | version 1.33 |
---|
| |
networks directly through the OpenFabrics OFED ibverbs library. This | networks directly through the OpenFabrics OFED ibverbs library. This |
avoids efficiency and portability issues associated with MPI. Look for | avoids efficiency and portability issues associated with MPI. Look for |
pre-built ibverbs NAMD binaries or specify ibverbs when building Charm++. | pre-built ibverbs NAMD binaries or specify ibverbs when building Charm++. |
| The newer verbs network layer should offer equivalent performance to |
| the ibverbs layer, plus support for multi-copy algorithms (replicas). |
| |
| Intel Omni-Path networks are incompatible with the pre-built ibverbs |
| NAMD binaries. Charm++ for verbs can be built with --with-qlogic |
| to support Omni-Path, but the Charm++ MPI network layer performs |
| better than the verbs layer. Hangs have been observed with Intel MPI |
| but not with OpenMPI, so OpenMPI is preferred. See ``Compiling NAMD'' |
| below for MPI build instructions. NAMD MPI binaries may be launched |
| directly with mpiexec rather than via the provided charmrun script. |
| |
Writing batch job scripts to run charmrun in a queueing system can be | Writing batch job scripts to run charmrun in a queueing system can be |
challenging. Since most clusters provide directions for using mpiexec | challenging. Since most clusters provide directions for using mpiexec |
| |
single compute node is much worse than comparable non-Cray host then | single compute node is much worse than comparable non-Cray host then |
it is very likely that your CPU affinity settings need to be fixed. | it is very likely that your CPU affinity settings need to be fixed. |
| |
| All Cray XE/XK/XC network layers support multi-copy algorithms (replicas). |
| |
| \subsection{Xeon Phi Processors (KNL)} |
| |
| Special Linux-KNL-icc and CRAY-XC-KNL-intel builds enable vectorizable |
| mixed-precision kernels while preserving full alchemical and other |
| functionality. Multi-host runs require multiple smp processes per host |
| (as many as 13 for Intel Omni-Path, 6 for Cray Aries) in order to drive |
| the network. Careful attention to CPU affinity settings (see below) is |
| required, as is 1 or 2 (but not 3 or 4) hyperthreads per PE core (but |
| only 1 per communication thread core). |
| |
| There appears to be a bug in the Intel 17.0 compiler that breaks the |
| non-KNL-optimized NAMD kernels (used for alchemical free energy, etc.) |
| on KNL. Therefore the Intel 16.0 compilers are recommended on KNL. |
| |
\subsection{SGI Altix UV} | \subsection{SGI Altix UV} |
| |
Use Linux-x86\_64-multicore and the following script to set CPU affinity: | Use Linux-x86\_64-multicore and the following script to set CPU affinity: |
| |
cores 0,1,4,5,8,9,... or 0-127:4.2. Running 4 processes with +ppn 31 | cores 0,1,4,5,8,9,... or 0-127:4.2. Running 4 processes with +ppn 31 |
would be ``+setcpuaffinity +pemap 0-127:32.31 +commap 31-127:32'' | would be ``+setcpuaffinity +pemap 0-127:32.31 +commap 31-127:32'' |
| |
| For Intel processors, including KNL, where hyperthreads on the same core |
| are not numbered consecutively, hyperthreads may be mapped to consecutive |
| PEs by appending [+span] to a core set, e.g., ``+pemap 0-63+64+128+192'' |
| to use all threads on a 64-core, 256-thread KNL with threads mapped to |
| PEs as 0,64,128,192,1,65,129,193,... |
| |
For an Altix UV or other machines where the queueing system assigns cores | For an Altix UV or other machines where the queueing system assigns cores |
to jobs this information must be obtained with numactl --show and passed | to jobs this information must be obtained with numactl --show and passed |
to NAMD in order to set thread affinity (which will improve performance): | to NAMD in order to set thread affinity (which will improve performance): |
| |
| |
The Xeon Phi coprocessor is supported in NAMD similar to CUDA GPUs. | The Xeon Phi coprocessor is supported in NAMD similar to CUDA GPUs. |
Binaries are not provided, so you will need to build from source code | Binaries are not provided, so you will need to build from source code |
(see "Compiling NAMD" below) specifying --with-mic to the config script. | (see ``Compiling NAMD'' below) specifying --with-mic to the config script. |
As with CUDA, multicore or ibverbs-smp builds are strongly recommended. | As with CUDA, multicore or ibverbs-smp builds are strongly recommended. |
A recent Intel compiler is obviously required to compile for Xeon Phi. | A recent Intel compiler is obviously required to compile for Xeon Phi. |
| |
| |
order to exclude startup costs and allow for initial load balancing. | order to exclude startup costs and allow for initial load balancing. |
| |
Multicore builds scale well within a single node, but may benefit from | Multicore builds scale well within a single node, but may benefit from |
setting CPU affinity using the +setcpuaffinity +pemap <map> +commap <map> | setting CPU affinity using the +setcpuaffinity +pemap $<$map$>$ +commap $<$map$>$ |
options described in CPU Affinity above. Experimentation is needed. | options described in CPU Affinity above. Experimentation is needed. |
| |
We provide standard (UDP), TCP, and ibverbs (InfiniBand) precompiled | We provide standard (UDP), TCP, and ibverbs (InfiniBand) precompiled |