NAMD Wiki: NamdAtOakRidge
Running on Summit
A job-submission script is provided at ~jimp/NAMD_scripts/runbatch_summit, with variants runbatch_summit_memopt, runbatch_summit_cpu, and runbatch_summit_cpu_memopt for memory-optimized and CPU-only builds. (Note that the standard runbatch_summit script uses GPUs, as CPU-only runs are considered the exception.)
The script requires three arguments:
- NAMD input file
- NAMD log file
- number of nodes (number of GPUs / 6)
- queue (defaults to "batch", other option is "test")
- replica or other args (optional)
The allocation to which the job is charged is inferred from the last entry in the output of the "groups" command, and may be controlled by setting the ACCOUNT environment variable.
Normal runs use PAMI-based binaries. If a replica count is specified (+replicas <n>) the MPI-based binaries are used, and the number of nodes may be reduced such that there are 1, 2, or 6 replicas per node (corresponding to 6, 3, or 1 GPUs per replica, multi-node replicas are not supported).
The script will use the "latest" binaries installed in /gpfs/alpine/world-shared/bip115/NAMD_binaries/summit/
Building on Summit
Download Tcl library
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-ppc64le-threaded.tar.gz
PAMI-based CUDA version for multi-node runs
Build Charm++:
module load spectrum-mpi; module load fftw; module list; ./build charm++ pami-linux-ppc64le smp --no-build-shared --with-production
Build NAMD:
module load spectrum-mpi; module load cuda; module load fftw; module list; ./config Linux-POWER-xlC.pami Summit --with-fftw3 --fftw-prefix $OLCF_FFTW_ROOT/lib --with-cuda --cuda-prefix $OLCF_CUDA_ROOT --cuda-gencode arch=compute_70,code=sm_70 --tcl-prefix ~jimp/tcl8.5.9-linux-ppc64le-threaded --charm-base $HOME/charm-6.9.1 --charm-arch pami-linux-ppc64le-smp cd Linux-POWER-xlC.pami make release
MPI-based CUDA version for multi-copy runs
Build Charm++:
module load spectrum-mpi; module load fftw; module list; ./build charm++ mpi-linux-ppc64le smp --no-build-shared --with-production
Build NAMD:
module load spectrum-mpi; module load cuda; module load fftw; module list; sed -i 's/charm_arch_mpi=1/charm_arch_mpi=0/' config ./config Linux-POWER-xlC.mpi Summit --with-fftw3 --fftw-prefix $OLCF_FFTW_ROOT/lib --with-cuda --cuda-prefix $OLCF_CUDA_ROOT --cuda-gencode arch=compute_70,code=sm_70 --tcl-prefix ~jimp/tcl8.5.9-linux-ppc64le-threaded --charm-base $HOME/charm-6.9.1 --charm-arch mpi-linux-ppc64le-smp cd Linux-POWER-xlC.mpi make release
Transferring files
bbcp (https://www.slac.stanford.edu/~abh/bbcp/) works well for transfers from outside machines. It must be installed both locally (home/jim/bin/Linux/bbcp) and at OLCF (/ccs/home/jimp/bin/bbcp) and in your path on both ends (e.g, add "export PATH=$HOME/bin:$PATH" to ~.bashrc).
Upload to Titan/Rhea (note hostname needed):
bbcp -V -T 'ssh jimp@dtn35.ccs.ornl.gov bbcp' 210stmv.coor dtn35.ccs.ornl.gov:/lustre/atlas1/bip115/scratch/jimp/
Download from Titan/Rhea (note -z AND hostname needed):
bbcp -V -S 'ssh jimp@dtn35.ccs.ornl.gov bbcp' -z dtn35.ccs.ornl.gov:/lustre/atlas1/bip115/scratch/jimp/foo.coor foo.coor
Download a directory of files by piping tar through bbcp:
bbcp -V -S 'ssh jimp@dtn35.ccs.ornl.gov bbcp' -N io "dtn35.ccs.ornl.gov:tar -c -O -C /lustre/atlas1/bio024/scratch/jimp/mar2014 foodir" 'tar -x'
To move between Titan/Rhea and Summit by staging to archive:
hsi put foo hsi get foo htar cvf foodir.tar foodir htar xvf foodir.tar
The data transfer nodes currently mount both scratch filesystems, so you can also just use cp -a.
See https://www.olcf.ornl.gov/for-users/system-user-guides/rhea/file-systems/#remote-transfers and https://www.olcf.ornl.gov/for-users/system-user-guides/rhea/file-systems/#hpss-best-practices