NAMD 2.12 New Features
Released in NAMD 2.12
The following features have been released in NAMD 2.12.
Any bug fixes will appear in the nightly build version on the
download site.
Any documentation updates will appear in the Nightly Build User's Guide
(online or
5.0M PDF)
and release notes.
GPU-accelerated simulations up to three times as fast as NAMD 2.11
Contributed by Antti-Pekka Hynninen.
The greatest benefit is for implicit solvent simulations
and for CPU-bound small single-node runs.
Enhancements include new direct nonbonded CUDA kernels
(may be disabled with "useCUDA2 no", automatically disabled for non-orthorhombic cells)
and PME entirely on GPU
(uses cufft library, may be disabled with "usePMECUDA no",
automatically disabled on more than 4 physical nodes).
Cufft errors have been observed with CUDA 8.0 builds on K80 GPUs.
Improved vectorization and new kernels for Xeon Phi Knight's Landing
Intel compiler vectorization and auto-dispatch has improved performance for
Intel processors supporting AVX instructions.
Special KNL builds (e.g., Linux-KNL-icc and CRAY-XC-KNL-intel)
for the Intel Xeon Phi Knight's Landing processor
enable new AVX-512 mixed-precision kernels
while preserving full alchemical and other functionality.
Note that multi-host KNL runs require multiple smp processes per host
(as many as 13 for Intel Omni-Path, 6 for Cray Aries)
in order to drive the network.
Careful attention to +pemap and +commap settings are required,
as is 1 or 2 (but not 3 or 4) hyperthreads per pe core (but only 1 per communication thread core).
For details see
NamdOnKNL
page on NamdWiki.
Improved scaling for large implicit solvent simulations
More robust sizing of spatial decomposition grid eliminates edge patches with excessive atoms.
Improved scaling for multi-threaded "smp" builds
Shared-memory parallelization (CkLoop) of PME is automatically enabled.
Greatest benefit is seen for large runs with "PMEProcessors [numNodes]" set.
Crashes are observed on KNL for single-process runs,
may be disabled with "useCkLoop 0".
Communication thread sleeps in single-process-per-replica smp runs
Provides the equivalent of multicore mode on smp builds
by keeping communication thread mostly idle.
This enables, e.g., a 24-core node to support 4 6-pe replicas
without oversubscribing cores, rather than 4 5-pe replicas
with 4 communication threads constantly polling for messages.
This is an advantage in particular for GPU-accelerated runs
that greatly benefit from smp/multicore builds.
Divide GPUs among replicas on host with "+devicesperreplica n"
Rather than every replica binding to all GPUs on its node,
only bind to the specified number of GPUs, distributing round-robin
across replicas present on node.
Shared-memory parallel calculation of collective variables
Improve performance for multicore/smp builds when large numbers of
collective variables are defined by distributing collective variable
calculation across threads.
Uses Charm++ CkLoop feature.
Contributed by Giacomo Fiorin and Jerome Henin.
Tcl scripting of collective variables thermodynamic integration
Scripting commands getappliedforce to gettotalforce to implement thermodynamic integration-based methods.
Contributed by Giacomo Fiorin and Jerome Henin.
Constraints on probability distributions of collective variables
Probability distribution restraints.
Contributed by Giacomo Fiorin and Jerome Henin.
Collective variables module improvements including to histogram bias
- Change in ABF convention: PLEASE SEE http://colvars.github.io/totalforce.html
- Extended-system ABF method by the CZAR estimator
- Histogram calculation on ensembles of variables, with optional weights
- Probability distribution restraints
- Contributed variable types: dipoleAngle (Alejandro Bernardin), groupCoordNum (Alan Grossfield)
- Scripting command "cvcflags" to optimize performance of complex colvars
- Improved error handling in user input and Tcl scripts
- Parallel calculation of center-of-mass based variables
Contributed by Giacomo Fiorin and Jerome Henin.
Extended adaptive biasing force on-the-fly free energy estimator
Method contributed by Haohao Fu and Christophe Chipot, integrated with colvars.
Dynamic lambda scaling for alchemical work calculations
Contributed by Brian Radak. Enables reversible switching between states for Monte Carlo methods.
Scaling of bonded terms in alchemical free energy calculations
Contributed by Brian Radak. Improves correctness for some cases.
Properly scaled alchemical Lennard-Jones long-range corrections
Contributed by Brian Radak.
Multigrator pressure and temperature control method
Contributed by Antti-Pekka Hynninen.
Retry after spurious EXDEV (Invalid cross-device link) output errors
A bug in the Linux NFS server code would cause these errors when renaming files.
Ability to modify grid force scaling without restarting
Useful for switching gradually between grid potentials during a run.
Ability to reload molecular structure without restarting
Existing "structure" keyword extended to load a new molecular structure, either from a psf/pdb file pair("structure foo.psf pdb foo.pdb") or a js file ("structure foo.js").
Must be followed by "reinitatoms" or "reinitatoms foo", the later reading from foo.coor, foo.vel, and foo.xsc,
and also starting a new trajectory file with "dcdfile foo.dcd" (for new atom count).
The patch grid and other simulation paramters are *not* changed, so the dimensions of the new and old structures must be compatible.
Optional Python scripting interface
Not in released binaries, pass --with-python to config script when building.
Tcl command "python" will run multi-line script or return result of single-line expression.
Python module "tcl" supports tcl.eval("command string") or tcl.call("command",arg1,arg2,...).
Python containers are converted to Tcl lists.
Python object "namd" wraps all simulation parameters and commands, e.g., "namd.switchdist = float(namd.cutoff) - 2.0" and "namd.run(1000)".
Simulation parameters are case-insensitive as in Tcl.
The Tcl print command and 1-4scaling parameter violate Python syntax but are accessible accessible via tcl.eval() and tcl.call().
Python print command works as usual.
QM/MM simulation via interfaces to ORCA and MOPAC
Developed by Marcelo Melo, Rafael Bernardi, and Till Rudack.
Supported in all binaries, recompilation not required.
For further information see
preliminary documentation.
Update to Charm++ 6.7.1
Various bug fixes and one new capability used in collective variable parallelization.
Require CUDA 6.5 or greater, drop support for Fermi GPUs
Pascal GPUs work after long pause for just-in-time kernel compilation on first run.
Compatible with CUDA 8.0, but cufft issues are observed on K80.