NAMD Wiki: NamdAtOakRidge

  You are encouraged to improve NamdWiki by adding content, correcting errors, or removing spam.

Running on Summit

A job-submission script is provided at ~jimp/NAMD_scripts/runbatch_summit, with variants runbatch_summit_memopt, runbatch_summit_cpu, and runbatch_summit_cpu_memopt for memory-optimized and CPU-only builds. (Note that the standard runbatch_summit script uses GPUs, as CPU-only runs are considered the exception.)

The script requires three arguments:

  • NAMD input file
  • NAMD log file
  • number of nodes (number of GPUs / 6)
  • queue (defaults to "batch", other option is "test")
  • replica or other args (optional)

The allocation to which the job is charged is inferred from the last entry in the output of the "groups" command, and may be controlled by setting the ACCOUNT environment variable.

Normal runs use PAMI-based binaries. If a replica count is specified (+replicas <n>) the MPI-based binaries are used, and the number of nodes may be reduced such that there are 1, 2, or 6 replicas per node (corresponding to 6, 3, or 1 GPUs per replica, multi-node replicas are not supported).

The script will use the "latest" binaries installed in /gpfs/alpine/world-shared/bip115/NAMD_binaries/summit/

Building on Summit

Download Tcl library

wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-ppc64le-threaded.tar.gz

PAMI-based CUDA version for multi-node runs

Build Charm++:

module load spectrum-mpi; module load fftw; module list;

./build charm++ pami-linux-ppc64le smp --no-build-shared --with-production

Build NAMD:

module load spectrum-mpi; module load cuda; module load fftw; module list;

./config Linux-POWER-xlC.pami Summit --with-fftw3 --fftw-prefix $OLCF_FFTW_ROOT/lib --with-cuda --cuda-prefix $OLCF_CUDA_ROOT --cuda-gencode arch=compute_70,code=sm_70 --tcl-prefix ~jimp/tcl8.5.9-linux-ppc64le-threaded --charm-base $HOME/charm-6.9.1 --charm-arch pami-linux-ppc64le-smp

cd Linux-POWER-xlC.pami

make release

MPI-based CUDA version for multi-copy runs

Build Charm++:

module load spectrum-mpi; module load fftw; module list;

./build charm++ mpi-linux-ppc64le smp --no-build-shared --with-production 

Build NAMD:

module load spectrum-mpi; module load cuda; module load fftw; module list;

sed -i 's/charm_arch_mpi=1/charm_arch_mpi=0/' config

./config Linux-POWER-xlC.mpi Summit --with-fftw3 --fftw-prefix $OLCF_FFTW_ROOT/lib --with-cuda --cuda-prefix $OLCF_CUDA_ROOT --cuda-gencode arch=compute_70,code=sm_70 --tcl-prefix ~jimp/tcl8.5.9-linux-ppc64le-threaded --charm-base $HOME/charm-6.9.1 --charm-arch mpi-linux-ppc64le-smp

cd Linux-POWER-xlC.mpi

make release

Transferring files

bbcp (https://www.slac.stanford.edu/~abh/bbcp/) works well for transfers from outside machines. It must be installed both locally (home/jim/bin/Linux/bbcp) and at OLCF (/ccs/home/jimp/bin/bbcp) and in your path on both ends (e.g, add "export PATH=$HOME/bin:$PATH" to ~.bashrc).

Upload to Titan/Rhea (note hostname needed):

bbcp -V -T 'ssh jimp@dtn35.ccs.ornl.gov bbcp' 210stmv.coor dtn35.ccs.ornl.gov:/lustre/atlas1/bip115/scratch/jimp/

Download from Titan/Rhea (note -z AND hostname needed):

bbcp -V -S 'ssh jimp@dtn35.ccs.ornl.gov bbcp' -z dtn35.ccs.ornl.gov:/lustre/atlas1/bip115/scratch/jimp/foo.coor foo.coor

Download a directory of files by piping tar through bbcp:

bbcp -V -S 'ssh jimp@dtn35.ccs.ornl.gov bbcp' -N io "dtn35.ccs.ornl.gov:tar -c -O -C /lustre/atlas1/bio024/scratch/jimp/mar2014 foodir" 'tar -x'

To move between Titan/Rhea and Summit by staging to archive:

hsi put foo

hsi get foo

htar cvf foodir.tar foodir

htar xvf foodir.tar

The data transfer nodes currently mount both scratch filesystems, so you can also just use cp -a.

See https://www.olcf.ornl.gov/for-users/system-user-guides/rhea/file-systems/#remote-transfers and https://www.olcf.ornl.gov/for-users/system-user-guides/rhea/file-systems/#hpss-best-practices