Next: Final Remarks Up: VMD Tutorial Previous: Comparing Structures and Sequences

Subsections

Running VMD on Supercomputers

As the number and speed of state-of-the-art CPUs and GPUs in modern HPC systems have increased the size and time scale of molecular dynamics simulations, the size of the trajectory data to be analyzed has increased commensurately. This has led molecular scientists to perform an increasing fraction of their routine VMD analysis tasks on the same HPC systems where simulations are performed, where high performance processors, GPUs, and storage systems make fast work of large trajectory analysis and visualization tasks. To facilitate usage on large scale HPC systems, VMD can be run in a non-interactive mode of execution, either serially or in parallel using MPI, with and without OpenGL graphics, all while retaining support for high fidelity ray tracing and support for GPU acceleration of computationally demanding analysis tasks.

Running VMD in Text Mode

All graphics-capable VMD builds can also be run in text mode any time there is a need to perform long-running non-interactive analysis tasks. When run in text mode, VMD does not allow use of the ``snapshot'' renderer or other OpenGL rendering features, however non-OpenGL renderers such as Tachyon, TachyonLOptiX, and TachyonLOSPRay are still available since they don't depend on OpenGL. When run in text mode, VMD prints no status messages related to interactive graphics during startup, but availability of optional features such as GPU-acceleration are still reported. To launch VMD in text mode, the -dispdev text flag is appended to the command line flags, as shown below:

% vmd -dispdev text -e myscript.vmd

Info) VMD for LINUXAMD64, version 1.9.3 (November 30, 2016)
Info) http://www.ks.uiuc.edu/Research/vmd/                         
Info) Email questions and bug reports to vmd@ks.uiuc.edu           
Info) Please include this reference in published work using VMD:   
Info)    Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual   
Info)    Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Multithreading available, 64 CPUs detected.
Info)   CPU features: SSE2 AVX FMA 
Info) Free system memory: 61GB (96%)
Info) No CUDA accelerator devices available.
Info) Dynamically loaded 2 plugins in directory:
Info) /Projects/vmd/pub/linux64/lib/vmd193/plugins/LINUXAMD64/molfile
vmd >

Running VMD with Off-Screen OpenGL Graphics

VMD can be compiled with support for off-screen OpenGL rendering, enabling the ``snapshot'' renderer to be used in non-interactive VMD sessions, e.g., for in-situ rendering. VMD supports off-screen rendering via so-called OpenGL pixel buffers or ``Pbuffers'', using either the OpenGL GLX windowing system interface, or through the OpenGL EGL embedded graphics interface.

The GLX-based Pbuffer feature is normally available in standard graphically-enabled compilations of VMD for Linux/Unix. VMD GLX off-screen rendering requires that an X-Windows server be running and that the DISPLAY environment variable is set to the correct server hostname and display. The GLX pbuffer feature is most appropriate when running VMD on a user's own desktop workstation, since the same user would likely control the active windowing system.

To use VMD with off-screen graphics, the -dispdev openglpbuffer flag is added to the VMD launch command, as shown below:

% vmd -dispdev openglpbuffer -e myscript.vmd

Info) VMD for LINUXAMD64, version 1.9.3 (November 30, 2016)
Info) http://www.ks.uiuc.edu/Research/vmd/                         
Info) Email questions and bug reports to vmd@ks.uiuc.edu           
Info) Please include this reference in published work using VMD:   
Info)    Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual   
Info)    Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Multithreading available, 40 CPUs detected.
Info)   CPU features: SSE2 AVX AVX2 FMA 
Info) Free system memory: 411GB (81%)
Info) Creating CUDA device pool and initializing hardware...
Info) Detected 3 available CUDA accelerators:
Info) [0] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) [1] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) [2] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) OpenGL Pbuffer size: 4096x2400
Info) OpenGL renderer: Quadro M6000 24GB/PCIe/SSE2
Info)   Features: MSAA(4) MDE CVA MTX NPOT PP PS GLSL(OVFGS) 
Info)   Full GLSL rendering mode is available.
Info)   Textures: 2-D (16384x16384), 3-D (4096x4096x4096), Multitexture (4)
Info) Created GLX OpenGL Pbuffer for off-screen rendering
Info) Detected 3 available TachyonL/OptiX ray tracing accelerators
Info)   Compiling 1 OptiX shaders on 3 target GPUs...

In cases where it is inconvenient or not possible to launch a windowing system such as on large-scale HPC systems, VMD supports the use of EGL-based Pbuffer rendering through compilation with a special OpenGL runtime dispatch library. As a result of the current need for special compilation, EGL-enabled versions of VMD are made available separaely from conventional VMD builds, and in many cases the user may need to compile VMD from source since EGL is often used in conjunction with MPI-enabled builds of VMD for parallel rendering, as outlined below.

% vmd -dispdev openglpbuffer -e myscript.vmd

Info) VMD for LINUXAMD64, version 1.9.3 (December 1, 2016)
Info) http://www.ks.uiuc.edu/Research/vmd/                         
Info) Email questions and bug reports to vmd@ks.uiuc.edu           
Info) Please include this reference in published work using VMD:   
Info)    Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual   
Info)    Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Multithreading available, 40 CPUs detected.
Info)   CPU features: SSE2 AVX AVX2 FMA 
Info) Free system memory: 411GB (81%)
Info) Creating CUDA device pool and initializing hardware...
Info) Detected 3 available CUDA accelerators:
Info) [0] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) [1] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) [2] Quadro M6000 24GB  24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP
Info) EGL: node[0] bound to display[0], 3 displays total
Info) EGL version 1.4
Info) OpenGL Pbuffer size: 4096x2400
Info) OpenGL renderer: Quadro M6000 24GB/PCIe/SSE2
Info)   Features: STENCIL MSAA(4) MDE CVA MTX NPOT PP PS GLSL(OVFGS) 
Info)   Full GLSL rendering mode is available.
Info)   Textures: 2-D (16384x16384), 3-D (4096x4096x4096), Multitexture (4)
Info) Created EGL OpenGL Pbuffer for off-screen rendering
Info) Detected 3 available TachyonL/OptiX ray tracing accelerators
Info)   Compiling 256 OptiX shaders on 3 target GPUs...

Both GLX- and EGL-based off-screen OpenGL Pbuffer rendering support all of the advanced OpenGL features used by VMD, such as programmable shading, multisample antialiasing, and 3-D texture mapping. One area where they behave differently from traditional windowed-OpenGL is that they have a fixed maximum framebuffer resolution, which defaults to 4096 $\times$ 2400. The maximum framebuffer size can be increased beyond this resolution by setting the VMDSCRSIZE environment variable to the maximum framebuffer resolution that might be required during a VMD run. In Bourne/bash shells, this would be done with the command: export VMDSCRSIZE="8192 4096" In C-shell/tcsh shells, this would be done with the command: setenv VMDSCRSIZE "8192 4096"

Using VMD with MPI

When VMD has been compiled with support for MPI it can be run in parallel on thousands of compute nodes at a time. There are presently a few noteworthy considerations that affect how VMD is compiled, launched, and how it behaves in a parallel environment as compared with conventional interactive desktop usage.

When compiled with MPI support and launched with the platform-dependent mpirun or site-specific launch commands (e.g., aprun, jsrun, or similar), VMD will automatically initialize MPI internally, and each parallel VMD instance will be assigned a unique MPI rank. During startup, a parallel launch of VMD will print hardware information about each of the participating compute nodes from node 0. When a parallel VMD run exits, all nodes are expected to call exit at the same time so that they shutdown MPI together.

Due to the fact that MPI does not (yet) include a standarized binary interface, MPI support requires that VMD be compiled from source code on the target platform for each MPI implementation to be supported. For example, VMD would have to be compiled separately for each MPI to be supported on the system, e.g., MPICH, OpenMPI, and/or other MPI versions. This means that in the general case, unlike the approach taken by the VMD development team where binary VMD distributions are provided for all mainstream computer and operating system platforms, this is not possible in the context of MPI. Users wishing to use VMD with MPI must compile VMD from source code.

Some MPI implementations require special interactions with batch queueing systems or storage systems, and in such cases it is necessary to modify the standard VMD launcher scripts to perform any extra steps or to invoke any platform-specific parallel launch commands. By modifying the VMD launch script, users can continue to use familiar VMD launch syntax while gaining the benefits of parallel analysis with MPI. The VMD launch script has been modified so that it can automatically recognize cases where VMD has been launched within batch schedulers used on Cray XK and XC supercomputers such as NCSA Blue Waters, ORNL Titan, CSCS Piz Daint, and related systems, where the VMD executable must be launched using the 'aprun' or 'srun' utilities, depending on the scheduling system in use.

MPI-enabled builds of VMD can often also be run on login nodes or interactive visualization nodes that may not managed by the batch scheduler and/or may not support MPI, so long as MPI and other shared libraries used on the compute nodes are also available on the login or interactive visualization nodes. To run an MPI-enabled VMD build outside of MPI, i.e., without 'mpirun', the environment variable VMDNOMPI can be set, which will prevent VMD from calling any MPI APIs during the run, allowing it to behave like a normal non-MPI build for convenience. In most cases, this makes it possible to use a single VMD build on all of the different compute node types on a system.

At present, the interactive console handling used in interactive text interpreters conflicts with the behavior of some mainstream MPI implementations, so when VMD is run in parallel using MPI, the interactive console is disabled and VMD instead reads commands only from script files specified with the "-e" command line argument.

Aside from the special launch behavior and lack of the interactive text console, MPI runs of VMD support high performance graphics with full support for OpenGL via GLX or EGL, and ray tracing with Tachyon, OptiX, and OSPRay.

Here is an example session showing a VMD run performed on the CSCS Piz Daint Cray XC50 supercomputer with NVIDIA Tesla P100 GPU accelerators:

stonej@daint103> srun -C gpu -n 256 --ntasks-per-node=1 \
   /users/stonej/local/bin/vmd193 -dispdev text -e rendermovie.tcl
on daint103
srun: job 50274 queued and waiting for resources
srun: job 50274 has been allocated resources
Info) VMD for CRAY_XC, version 1.9.3 (December 15, 2016)
Info) http://www.ks.uiuc.edu/Research/vmd/                         
Info) Email questions and bug reports to vmd@ks.uiuc.edu           
Info) Please include this reference in published work using VMD:   
Info)    Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual   
Info)    Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Creating CUDA device pool and initializing hardware...
Info) Initializing parallel VMD instances via MPI...
Info) Found 256 VMD MPI nodes containing a total of 6144 CPUs and 256 GPUs:
Info)    0:  24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03072
Info)    1:  24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03073
Info)    2:  24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03074
[...example output omitted...]
Info)  253:  24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03375
Info)  254:  24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03376
Info)  255:  24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03377

VMD parallel commands

The parallel command enables large scale parallel scripting when VMD has been compiled with MPI support. In absence of MPI support, the parallel command is still available, but it operates the same way it would if an MPI-enabled VMD would when run on only a single node. The parallel command enables large analysis scripts to be easily adapted for execution on large clusters and supercomputers to support simulation, analysis, and visualization operations that would otherwise be too computationally demanding for conventional workstations.

nodename: Return the hostname of the current compute node.
noderank: Return the MPI rank of the current compute node.
nodecount: Return the total number of MPI ranks in the currently running VMD job.
allgather object: Perform a parallel allgather operation across all MPI ranks, taking the user defined object as input to each caller. All VMD MPI ranks must participate in the allgather operation.
allreduce user_reduction_procedure object: Perform a parallel reduction across all MPI ranks by calling the user-supplied reduction procedure, passing in a user defined object. All VMD MPI ranks must participate in the allreduce operation.
barrier: Perform a barrier synchronization across all MPI ranks.
for startcount endcount user_worker_procedure object: Invoke VMD parallel work scheduler to run a computation over all MPI ranks. The VMD work scheduler uses dynamic load balancing to assign work indices to workers, calling the user-defined procedure for each work item.

The parallel gather command allows VMD analysis scripts to gather results from all of the nodes, returning the complete set of per-node results in a new Tcl list. Here is a simple example procedure that shows this principle by gathering up all of the MPI node hostnames and printing them. Note that to avoid redundant output, the script always does output only on node rank 0:

proc testgather { } {
  set noderank  [parallel noderank]

  # only print messages on node 0
  if {$noderank == 0} {
    puts "Testing parallel gather..."
  }

  # Do a parallel gather of all node names
  set datalist [parallel allgather [parallel nodename]]

  # only print messages on node 0
  if {$noderank == 0} {
    puts "datalist length: [llength $datalist]"
    puts "datalist: $datalist"
  }
}

The parallel allreduce command allows VMD to compute a parallel reduction across all MPI ranks, returning the final result to all nodes. Each rank contributes one input to the reduction. The user must provide a Tcl proc that performs the appropriate reduction operation for a pair of data items, resulting in a single item. This approach allows arbitrarily complex reductions on arbitrary data to be devised by the user. The VMD reduction implementation calls the user provided routine in parallel on pairs of arguments exchanged between ranks, with each such call producing a single reduced output value. VMD performs successive parallel reduction operations until it computes the final reduced value that is returned to all ranks.

The example below returns the sum of all of the MPI node ranks:

proc sumreduction { a b } {
  return [expr $a + $b]
}

proc testreduction {} {
  set noderank  [parallel noderank]

  # only print messages on node 0
  if {$noderank == 0} {
    puts "Testing parallel reductions..."
  }
  parallel allreduce sumreduction $noderank
}

VMD can easily perform parallel rendering of trajectories or other kinds of movies with relatively simple scripting based on the parallel commands above. Try running this simple script in an MPI-based VMD session, which just uses the individual MPI node ranks to render one frame per-node. Be sure to replace ``somedir'' with your own directory:

set noderank [parallel noderank]

puts "node $noderank is running ..."
parallel barrier

mol new /somedir/alanin.pdb waitfor all
puts "node $noderank has loaded data"
parallel barrier

rotate y by [expr $noderank * 20]
render TachyonInternal test_node_$noderank.tga
puts "node $noderank has rendered a frame"

parallel barrier
quit

A much more sophisticated (but incomplete) example below shows how the parallel for command can be used along with a user-defined procedure to do larger scale parallel rendering with dynamic load balancing, passing parameters into the user defined procedure, triggering the VMD movie maker plugin and any user-defined per-frame callback that may be active therein. The userdata parameter shown here is used to communicate information necessary for the user-defined worker procedure to interpret the meaning of incoming work indices and take appropriate actions. The userdata parameter enables the user-provided procedure to avoid using global variables or hard-coded implementation details.

proc render_one_frame { frameno userdata } { 
  # retrieve user data rendering workers 
  set formatstr [lindex $userdata 0] 
  set dir [lindex $userdata 1] 
  set renderer [lindex $userdata 2] 
  # Set frame, triggering user-defined movie 
  # callbacks to update the molecular scene 
  # prior to rendering of the frame 
  set ::MovieMaker::userframe $frameno 
  # Regenerate molecular geometry if not up to date 
  display update 
  # generate output filename, and render the frame 
  set fname [format $formatstr $frameno] 
  render $renderer $dir$fname 
} 

proc render_movie { dir formatstr framecount renderer } { 
  set userdata {} 
  lappend userdata $formatstr 
  lappend userdata $dir 
  lappend userdata $renderer 
  set lastframe [expr $framecount - 1] 
  parallel for 0 $lastframe render_one_frame $userdata 
}

Next: Final Remarks Up: VMD Tutorial Previous: Comparing Structures and Sequences

vmd@ks.uiuc.edu