next up previous contents index
Next: Xeon Phi Processors (KNL) Up: Running NAMD Previous: Shared-Memory and Network-Based Parallelism

Cray XE/XK/XC

First load modules for the GNU compilers (XE/XK only, XC should use Intel), topology information, huge page sizes, and the system FFTW 3 library:

  module swap PrgEnv-cray PrgEnv-gnu
  module load rca
  module load craype-hugepages8M
  module load fftw

The CUDA Toolkit module enables dynamic linking, so it should only be loaded when building CUDA binaries and never for non-CUDA binaries:

  module load cudatoolkit

For CUDA or large simulations on XE/XK use gemini_gni-crayxe-persistent-smp and for smaller XE simulations use gemini_gni-crayxe-persistent. For XC similarly use gni-crayxc-persistent-smp or gni-crayxc-persistent.

For XE/XK use CRAY-XE-gnu and (for CUDA) the ``-with-cuda'' config option, the appropriate ``-charm-arch'' parameter, and -with-fftw3. For XC use instead CRAY-XC-intel but all other options the same.

Your batch job will need to load modules and set environment variables:

  module swap PrgEnv-cray PrgEnv-gnu
  module load rca
  module load craype-hugepages8M
  setenv HUGETLB_DEFAULT_PAGE_SIZE 8M
  setenv HUGETLB_MORECORE no

To run an SMP build with one process per node on 16 32-core nodes:

  aprun -n 16 -r 1 -N 1 -d 31 /path/to/namd2 +ppn 30 +pemap 1-30 +commap 0 <configfile>

or the same with 4 processes per node:

  aprun -n 64 -N 4 -d 8 /path/to/namd2 +ppn 7 \
            +pemap 1-7,9-15,17-23,25-31 +commap 0,8,16,24 <configfile>

or non-SMP, leaving one core free for the operating system:

  aprun -n 496 -r 1 -N 31 -d 1 /path/to/namd2 +pemap 0-30 <configfile>

The explicit +pemap and +commap settings are necessary to avoid having multiple threads assigned to the same core (or potentially all threads assigned to the same core). If the performance of NAMD running on a single compute node is much worse than comparable non-Cray host then it is very likely that your CPU affinity settings need to be fixed.

All Cray XE/XK/XC network layers support multi-copy algorithms (replicas).


next up previous contents index
Next: Xeon Phi Processors (KNL) Up: Running NAMD Previous: Shared-Memory and Network-Based Parallelism
http://www.ks.uiuc.edu/Research/namd/