Released in NAMD 2.12

The following features have been released in NAMD 2.12. Any bug fixes will appear in the nightly build version on the download site. Any documentation updates will appear in the Nightly Build User's Guide (online or 1.7M PDF) and release notes.

GPU-accelerated simulations up to three times as fast as NAMD 2.11

Contributed by Antti-Pekka Hynninen. The greatest benefit is for implicit solvent simulations and for CPU-bound small single-node runs. Enhancements include new direct nonbonded CUDA kernels (may be disabled with "useCUDA2 no", automatically disabled for non-orthorhombic cells) and PME entirely on GPU (uses cufft library, may be disabled with "usePMECUDA no", automatically disabled on more than 4 physical nodes). Cufft errors have been observed with CUDA 8.0 builds on K80 GPUs.

Improved vectorization and new kernels for Xeon Phi Knight's Landing

Intel compiler vectorization and auto-dispatch has improved performance for Intel processors supporting AVX instructions. Special KNL builds (e.g., Linux-KNL-icc and CRAY-XC-KNL-intel) for the Intel Xeon Phi Knight's Landing processor enable new AVX-512 mixed-precision kernels while preserving full alchemical and other functionality. Note that multi-host KNL runs require multiple smp processes per host (as many as 13 for Intel Omni-Path, 6 for Cray Aries) in order to drive the network. Careful attention to +pemap and +commap settings are required, as is 1 or 2 (but not 3 or 4) hyperthreads per pe core (but only 1 per communication thread core). For details see NamdOnKNL page on NamdWiki.

Improved scaling for large implicit solvent simulations

More robust sizing of spatial decomposition grid eliminates edge patches with excessive atoms.

Improved scaling for multi-threaded "smp" builds

Shared-memory parallelization (CkLoop) of PME is automatically enabled. Greatest benefit is seen for large runs with "PMEProcessors [numNodes]" set. Crashes are observed on KNL for single-process runs, may be disabled with "useCkLoop 0".

Communication thread sleeps in single-process-per-replica smp runs

Provides the equivalent of multicore mode on smp builds by keeping communication thread mostly idle. This enables, e.g., a 24-core node to support 4 6-pe replicas without oversubscribing cores, rather than 4 5-pe replicas with 4 communication threads constantly polling for messages. This is an advantage in particular for GPU-accelerated runs that greatly benefit from smp/multicore builds.

Divide GPUs among replicas on host with "+devicesperreplica n"

Rather than every replica binding to all GPUs on its node, only bind to the specified number of GPUs, distributing round-robin across replicas present on node.

Shared-memory parallel calculation of collective variables

Improve performance for multicore/smp builds when large numbers of collective variables are defined by distributing collective variable calculation across threads. Uses Charm++ CkLoop feature. Contributed by Giacomo Fiorin and Jerome Henin.

Tcl scripting of collective variables thermodynamic integration

Scripting commands getappliedforce to gettotalforce to implement thermodynamic integration-based methods. Contributed by Giacomo Fiorin and Jerome Henin.

Constraints on probability distributions of collective variables

Probability distribution restraints. Contributed by Giacomo Fiorin and Jerome Henin.

Collective variables module improvements including to histogram bias

  • Change in ABF convention: PLEASE SEE
  • Extended-system ABF method by the CZAR estimator
  • Histogram calculation on ensembles of variables, with optional weights
  • Probability distribution restraints
  • Contributed variable types: dipoleAngle (Alejandro Bernardin), groupCoordNum (Alan Grossfield)
  • Scripting command "cvcflags" to optimize performance of complex colvars
  • Improved error handling in user input and Tcl scripts
  • Parallel calculation of center-of-mass based variables
Contributed by Giacomo Fiorin and Jerome Henin.

Extended adaptive biasing force on-the-fly free energy estimator

Method contributed by Haohao Fu and Christophe Chipot, integrated with colvars.

Dynamic lambda scaling for alchemical work calculations

Contributed by Brian Radak. Enables reversible switching between states for Monte Carlo methods.

Scaling of bonded terms in alchemical free energy calculations

Contributed by Brian Radak. Improves correctness for some cases.

Properly scaled alchemical Lennard-Jones long-range corrections

Contributed by Brian Radak.

Multigrator pressure and temperature control method

Contributed by Antti-Pekka Hynninen.

Retry after spurious EXDEV (Invalid cross-device link) output errors

A bug in the Linux NFS server code would cause these errors when renaming files.

Ability to modify grid force scaling without restarting

Useful for switching gradually between grid potentials during a run.

Ability to reload molecular structure without restarting

Existing "structure" keyword extended to load a new molecular structure, either from a psf/pdb file pair("structure foo.psf pdb foo.pdb") or a js file ("structure foo.js"). Must be followed by "reinitatoms" or "reinitatoms foo", the later reading from foo.coor, foo.vel, and foo.xsc, and also starting a new trajectory file with "dcdfile foo.dcd" (for new atom count). The patch grid and other simulation paramters are *not* changed, so the dimensions of the new and old structures must be compatible.

Optional Python scripting interface

Not in released binaries, pass --with-python to config script when building. Tcl command "python" will run multi-line script or return result of single-line expression. Python module "tcl" supports tcl.eval("command string") or"command",arg1,arg2,...). Python containers are converted to Tcl lists. Python object "namd" wraps all simulation parameters and commands, e.g., "namd.switchdist = float(namd.cutoff) - 2.0" and "". Simulation parameters are case-insensitive as in Tcl. The Tcl print command and 1-4scaling parameter violate Python syntax but are accessible accessible via tcl.eval() and Python print command works as usual.

QM/MM simulation via interfaces to ORCA and MOPAC

Developed by Marcelo Melo, Rafael Bernardi, and Till Rudack. Supported in all binaries, recompilation not required. For further information see preliminary documentation.

Update to Charm++ 6.7.1

Various bug fixes and one new capability used in collective variable parallelization. Will switch to upcoming Charm++ 6.8.0 after 2.12 final release.

Require CUDA 6.5 or greater, drop support for Fermi GPUs

Pascal GPUs work after long pause for just-in-time kernel compilation on first run. Compatible with CUDA 8.0, but cufft issues are observed on K80.