CVS diff ug_runit.tex

Difference for ug/ug_runit.tex from version 1.32 to 1.33

version 1.32

version 1.33

Line 64

networks directly through the OpenFabrics OFED ibverbs library. This

avoids efficiency and portability issues associated with MPI. Look for

pre-built ibverbs NAMD binaries or specify ibverbs when building Charm++.

The newer verbs network layer should offer equivalent performance to

the ibverbs layer, plus support for multi-copy algorithms (replicas).

Intel Omni-Path networks are incompatible with the pre-built ibverbs

NAMD binaries. Charm++ for verbs can be built with --with-qlogic

to support Omni-Path, but the Charm++ MPI network layer performs

better than the verbs layer. Hangs have been observed with Intel MPI

but not with OpenMPI, so OpenMPI is preferred. See ``Compiling NAMD''

below for MPI build instructions. NAMD MPI binaries may be launched

directly with mpiexec rather than via the provided charmrun script.

Writing batch job scripts to run charmrun in a queueing system can be

challenging. Since most clusters provide directions for using mpiexec

Line 283

Line 293

single compute node is much worse than comparable non-Cray host then

it is very likely that your CPU affinity settings need to be fixed.

All Cray XE/XK/XC network layers support multi-copy algorithms (replicas).

\subsection{Xeon Phi Processors (KNL)}

Special Linux-KNL-icc and CRAY-XC-KNL-intel builds enable vectorizable

mixed-precision kernels while preserving full alchemical and other

functionality. Multi-host runs require multiple smp processes per host

(as many as 13 for Intel Omni-Path, 6 for Cray Aries) in order to drive

the network. Careful attention to CPU affinity settings (see below) is

required, as is 1 or 2 (but not 3 or 4) hyperthreads per PE core (but

only 1 per communication thread core).

There appears to be a bug in the Intel 17.0 compiler that breaks the

non-KNL-optimized NAMD kernels (used for alchemical free energy, etc.)

on KNL. Therefore the Intel 16.0 compilers are recommended on KNL.

\subsection{SGI Altix UV}

Use Linux-x86\_64-multicore and the following script to set CPU affinity:

Line 342

Line 368

cores 0,1,4,5,8,9,... or 0-127:4.2. Running 4 processes with +ppn 31

would be ``+setcpuaffinity +pemap 0-127:32.31 +commap 31-127:32''

For Intel processors, including KNL, where hyperthreads on the same core

are not numbered consecutively, hyperthreads may be mapped to consecutive

PEs by appending [+span] to a core set, e.g., ``+pemap 0-63+64+128+192''

to use all threads on a 64-core, 256-thread KNL with threads mapped to

PEs as 0,64,128,192,1,65,129,193,...

For an Altix UV or other machines where the queueing system assigns cores

to jobs this information must be obtained with numactl --show and passed

to NAMD in order to set thread affinity (which will improve performance):

Line 451

Line 483

The Xeon Phi coprocessor is supported in NAMD similar to CUDA GPUs.

Binaries are not provided, so you will need to build from source code

(see "Compiling NAMD" below) specifying --with-mic to the config script.

(see ``Compiling NAMD'' below) specifying --with-mic to the config script.

As with CUDA, multicore or ibverbs-smp builds are strongly recommended.

A recent Intel compiler is obviously required to compile for Xeon Phi.

Line 493

Line 525

order to exclude startup costs and allow for initial load balancing.

Multicore builds scale well within a single node, but may benefit from

setting CPU affinity using the +setcpuaffinity +pemap <map> +commap <map>

setting CPU affinity using the +setcpuaffinity +pemap $<$map$>$ +commap $<$map$>$

options described in CPU Affinity above. Experimentation is needed.

We provide standard (UDP), TCP, and ibverbs (InfiniBand) precompiled

Legend:

Removed in v.1.32
changed lines
	Added in v.1.33

Made by using version 1.53 of cvs2html