GPU Acceleration of Molecular Modeling Applications
Modern graphics processing units (GPUs) contain hundreds of arithmetic units and can be harnessed to provide tremendous acceleration for numerically intensive scientific applications such as molecular modeling. The increased capabilities and flexibility of recent GPU hardware combined with high level GPU programming languages such as CUDA and OpenCL has unlocked this computational power and made it accessible to computational scientists. The key to effective GPU computing is the design and implementation of data-parallel algorithms that scale to hundreds of tightly coupled processing units. Many molecular modeling applications are well suited to GPUs, due to their extensive computational requirements, and because they lend themselves to data-parallel implementations. Several exemplary results from our GPU computing work are presented in Klaus Schulten's Keynote Lecture from the 2010 GPU Technology Conference.
|
|
Molecular Dynamics
Continuing increases in high performance computing technology have rapidly expanded the domain of biomolecular simulation from isolated proteins in solvent to complex aggregates, often in a lipid environment. Such systems routinely comprise 100,000 atoms, and several published NAMD simulations have exceeded 1,000,000 atoms. Studying the function of even the simplest biomolecular machines requires simulations of 100 ns or longer, even when employing simulation techniques for accelerating processes of interest. One of the most time consuming calculations in a typical molecular dynamics simulation is the evaluation of forces between atoms that do not share bonds. The high degree of parallelism and floating point arithmetic capability of GPUs can attain performance levels twenty times that of a single CPU core. The twenty-fold acceleration provided by the GPU decreases the runtime for the non-bonded force evaluations such that it can be overlapped with bonded forces and PME long-range force calculations on the CPU. These and other CPU-bound operations must be ported to the GPU before further acceleration of the entire NAMD application can be realized.
Multi-Resolution Molecular Surface Visualization
Molecular surface visualization allows researchers to see where structures are exposed to solvent, where structures come into contact, and to view the overall architecture of large biomolecular complexes such as trans-membrane channels and virus capsids. Recently, we have developed a new GPU-accelerated multi-resolution molecular surface representation, enabling smooth interactive animation of moderate sized biomolecular complexes consisting of a few hundred thousand to one million atoms, and interactive display of molecular surfaces for multi-million atom complexes, e.g. large virus capsids. The GPU-accelerated QuickSurf representation in VMD achieves performance orders of magnitude faster than the conventional Surf and MSMS representations, and makes VMD the first molecular visualization tool capable of achieving smooth animations of surface representations for systems of up to one million atoms.
Molecular Orbital Display
Visualization of molecular orbitals (MOs) is important for analyzing the results of quantum chemistry simulations. The functions describing the MOs are computed on a three-dimensional lattice, and the resulting data can then be used for plotting isocontours or isosurfaces for visualization as well as for other types of analyses. Existing software packages that render MOs perform calculations on the CPU and require runtimes of tens to hundreds of seconds depending on the complexity of the molecular system.
We have developed present novel data-parallel algorithms for computing MOs on modern graphics processing units (GPUs) using CUDA. As recently reported, the fastest GPU algorithm achieves up to a 125-fold speedup over an optimized CPU implementation running on one CPU core. We have implemented these algorithms within the popular molecular visualization program VMD, which can now produce high quality MO renderings for large systems in less than a second, and achieves the first-ever interactive animations of quantum chemistry simulation trajectories using only on-the-fly calculation.Ion Placement
To best reproduce physiological conditions, molecular dynamics simulations must be run in the presence of appropriate ions. Generally such simulations are performed in the presence of sodium chloride, although in some cases (such as simulations including nucleic acid) other ions such as magnesium are necessary. Although many tools such as the VMD Autoionize plugin can place a random distribution of ions, molecules requiring counterions for their stability are better treated using ion placement methods which take the electrostatics of the solute into account. One method for doing this is to place important counterions at minima in the electrostatic potential field generated by the biomolecule of interest, iteratively updating the potential field after each ion is placed.
While this method of ion placement is simple and computes ion positions matched to the specific target molecule, it can be very computationally demanding for large structures because it requires calculation of the electrostatic potential at all points on a high-resolution 3-D lattice in the neighborhood of the solute. Coulomb-based ionization of very large structures such as viruses could require several days even using moderately sized clusters of computers. However, the calculation of a function on a lattice where all points are independent is an ideal application for GPU acceleration, and as recently reported in the Journal of Computational Chemistry, the use of GPUs to accelerate Coulomb-based ion placement leads to speedups of 100 times or more, allowing large structures to be properly ionized in less than an hour on a single desktop computer.
|
|
|
|
|
The direct summation of the Coulomb potential from all atoms to every lattice point requires computational work that grows quadratically, proportional to the product of the number of atoms and the number of lattice points. An algorithmic enhancement known as multilevel summation uses hierarchical interpolation of softened pairwise potentials from lattices of increasing coarseness to compute an approximation to the Coulomb potential. The amount of computational work for multilevel summation grows linearly, proportional to the sum of the number of atoms and the number of lattice points. Our reported GPU-assisted implementation of this method further reduces the time of obtaining large ionized structures to just a few minutes on a single desktop computer. The accuracy of the implementation is sufficient (with an average difference from the direct approach demonstrated to be in the range of 0.025% to 0.037%) to permit identical ion placement as the direct summation approach for small test molecules and nearly identical results for the ribosome.
The GPU-accelerated Coulomb potential calculation can be directly applied to calculate time-averaged electrostatic potentials from molecular dynamics simulations. As we reported, a VMD calculation of the electrostatic potential for one frame of a molecular dynamics simulation of the ribosome takes 529 seconds on a single GPU, as opposed to 5.24 hours on a single CPU core. A multilevel summation calculation for a single frame requires 67 seconds on one GPU.
Multi-GPU Coulomb Summation
Just as scientific computing can be done on clusters composed of a large number of CPU cores, in some cases problems can be decomposed and run in parallel on multiple GPUs within a single host machine, achieving correspondingly higher levels of performance. One of the drawbacks to the use of multi-core CPUs for scientific computing has been the limited amount of memory bandwidth available to each CPU socket, often severely limiting the performance of bandwidth-intensive scientific codes. Recently this problem has been further exacerbated since the memory bandwidth available to each CPU socket hasn't kept pace with the increasing number of cores in current CPUs. Since GPUs contain their own on-board high performance memory, the available memory bandwidth available for computational kernels scales as the number of GPUs is increased. This property can allow single-system multi-GPU codes to scale much better than their multi-core CPU based counterparts. Highly data-parallel and memory bandwidth intensive problems are often excellent candidates for such multi-GPU performance scaling.
The direct Coulomb summation algorithm implemented in VMD is an exemplary case for multi-GPU acceleration. The scaling efficiency for direct summation across multiple GPUs is nearly perfect -- the use of 4 GPUs delivers almost exactly 4X performance increase. A single GPU evaluates up to 39 billion atom potentials per second, performing 290 GFLOPS of floating point arithmetic. With the use of four GPUs, total performance increases to 157 billion atom potentials per second and 1.156 TFLOPS of floating point arithmetic, for a multi-GPU speedup of 3.99 and a scaling efficiency of 99.7%, as recently reported. To match this level of performance using CPUs, hundreds of state-of-the-art CPU cores would be required, along with their attendant cabling, power, and cooling requirements. While only one of the first steps in our exploration of the use of multiple GPUs, this result clearly demonstrates that it is possible to harness multiple GPUs in a single system with high efficiency.
Fluorescence Microphotolysis
Fluorescence microphotolysis is a non-invasive method of studying dynamics of cellular components using optical microscopy. In its framework, a small area of a fluorescent specimen is illuminated by a focused laser beam, and the fluorescence of the illuminated spot is recorded. Analyzing the change of the fluorescence signal with time, one can extract diffusion constants of the fluorescent molecules. However, such an analysis of experimental data often requires numerical calculations, namely, a diffusion-reaction equation (a partial differential equation in time and 2D or 3D space) has to be solved. Numerical schemes for solving this equation on a grid feature a significant degree of parallelism; indeed, the scheme can be represented as a vector-matrix multiplication problem, which is common for graphics applications and can easily be computed on a GPU. On the other hand, the computation of the fluorescent molecules concentration at a given point depends on the concentration at other points, introducing interdependencies that limit parallelism. Nevertheless, it has been demonstrated recently that one can achieve a significant speed-up with the GPU-accelerated computation of the fluorescence microphotolysis signals, as compared to the CPU computation. The computation that took about 8 minutes on a CPU has been shown to run in 38 seconds on a GPU. Given that experimentalists need to perform multiple computation runs with various parameters to match the observed fluorescence signals, this 12-times speed-up is very welcome. As we reported, the GPUs accelerated computation of fluorescence measurements opens new possibilities for experiments that employ new high-resolution microscopes (such as the so-called 4Pi microscope), because, due to the intricate pattern of light distribution in such microscopes, numerical solution is necessary to analyze experimental data. Further information on this topic is available here.
Software
- NAMD Molecular Dynamics
- VMD Molecular Visualization
- GPU Programming for Molecular Modeling Worksop, 2010.
- GPU Computing Gems Vol. 1 Source Code:
- GPUComputing.Net
- GPGPU.org
- Khronos Group, OpenCL specifications and software
- Download the AMD/ATI OpenCL toolkit and drivers
- Download the NVIDIA CUDA and OpenCL toolkits and drivers
Book Chapters
![]() |
GPU-accelerated computation and interactive display of molecular orbitals. John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, and Klaus Schulten. In Wen-mei Hwu, editor, GPU Computing Gems, chapter 1, pp. 5-18. Morgan Kaufmann Publishers, 2011.
|
![]() |
Fast molecular electrostatics algorithms on GPUs. David J. Hardy, John E. Stone, Kirby L. Vandivort, David Gohara, Christopher Rodrigues, and Klaus Schulten. In Wen-mei Hwu, editor, GPU Computing Gems, chapter 4, pp. 43-58. Morgan Kaufmann Publishers, 2011.
|
![]() |
GPU algorithms for molecular modeling. John E. Stone, David J. Hardy, Barry Isralewitz, and Klaus Schulten. In Jack Dongarra, David A. Bader, and Jakub Kurzak, editors, Scientific Computing with Multicore and Accelerators, chapter 16, pp. 351-371. Chapman & Hall/CRC Press, 2011. |
Publications
-
Publications Database Fast analysis of molecular dynamics trajectories with graphics processing units-radial distribution function histogramming. Benjamin G. Levine, John E. Stone, and Axel Kohlmeyer. Journal of Computational Physics, 230:3556-3569, 2011.
-
Publications Database Immersive molecular visualization and interactive modeling with commodity hardware. John E. Stone, Axel Kohlmeyer, Kirby L. Vandivort, and Klaus Schulten. Lecture Notes in Computer Science, 6454:382-393, 2010.
- Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters
Jeremy Enos, Craig Steffen, Joshi Fullop, Michael Showerman,
Guochun Shi, Kenneth Esler, Volodymyr Kindratenko,
John E. Stone, and James C. Phillips.
International Conference on Green Computing, pp. 317-324, 2010.
-
Publications Database GPU-accelerated molecular modeling coming of age. John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten. Journal of Molecular Graphics and Modelling, 29:116-125, 2010.
-
Publications Database OpenCL: A parallel programming standard for heterogeneous computing systems. John E. Stone, David Gohara, and Guochun Shi. Computing in Science and Engineering, 12:66-73, 2010.
- An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems.
Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu.
ASPLOS '10: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 347-358, 2010.
- Probing Biomolecular Machines with Graphics Processors.
James C. Phillips, John E. Stone. Communications of the ACM 52(10):34-41, 2009.
- GPU Clusters for High Performance Computing.
Volodymyr Kindratenko, Jeremy Enos, Guochun Shi, Michael Showerman,
Galen Arnold, John E. Stone, James Phillips, Wen-mei Hwu.
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on. pp. 1-8, Aug. 2009.
- Long time-scale simulations of in vivo diffusion using GPU hardware.
Elijah Roberts, John E. Stone, Leonardo Sepulveda, Wen-mei W. Hwu, and Zaida Luthey-Schulten.
In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1-8, 2009
-
Publications Database High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs. John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, and Klaus Schulten. In Proceedings of the 2nd Workshop on General-Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, volume 383, pp. 9-18, New York, NY, USA, 2009. ACM.
-
Publications Database Multilevel summation of electrostatic potentials using graphics processing units. David J. Hardy, John E. Stone, and Klaus Schulten. Journal of Parallel Computing, 35:164-177, 2009.
-
Publications Database Adapting a message-driven parallel application to GPU-accelerated clusters. James C. Phillips, John E. Stone, and Klaus Schulten. In SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Piscataway, NJ, USA, 2008. IEEE Press.
-
Publications Database GPU acceleration of cutoff pair potentials for molecular modeling applications. Christopher I. Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, and Wen-mei W. Hwu. In CF'08: Proceedings of the 2008 conference on Computing Frontiers, pp. 273-282, New York, NY, USA, 2008. ACM.
-
Publications Database GPU computing. John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips. Proceedings of the IEEE, 96:879-899, 2008.
-
Publications Database Continuous fluorescence microphotolysis and correlation spectroscopy using 4Pi microscopy. Anton Arkhipov, Jana Hüve, Martin Kahms, Reiner Peters, and Klaus Schulten. Biophysical Journal, 93:4006-4017, 2007.
-
Publications Database Accelerating molecular modeling applications with graphics processors. John E. Stone, James C. Phillips, Peter L. Freddolino, David J. Hardy, Leonardo G. Trabuco, and Klaus Schulten. Journal of Computational Chemistry, 28:2618-2640, 2007.
Presentations
- Faster, Cheaper, Better: Biomolecular Simulation with NAMD, VMD, and CUDA, NVIDIA Booth, Supercomputing 2010, New Orleans, LA (11/16/2010)
- High Performance Computing with CUDA Case Study: Heterogeneous GPU Computing for Molecular Modeling, CUDA Tutorial, Supercomputing 2010, New Orleans, LA (11/14/2010)
- GPU and the Computational Microscope, GPU Technology Conference (09/22/2010)
- NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Desktop, GPU Technology Conference (09/23/2010)
- High Performance Molecular Simulation, Visualization, and Analysis on GPUs, GPU Technology Conference (09/22/2010)
- Simulating Biomolecules on GPUs with the Multilevel Summation Method, Oak Ridge National Laboratory (09/17/2010)
- High Performance Molecular Simulation, Visualization, and Analysis on GPUs, Oak Ridge National Laboratory (09/16/2010)
- Faster, Cheaper, and Better Science: Molecular Modeling on GPUs, Fall National Meeting of the American Chemical Society, Boston, MA (08/22/2010)
- OpenCL: Molecular Modeling on Heterogeneous Computing Systems, Fall National Meeting of the American Chemical Society, Boston, MA (08/22/2010)
- Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters, The Work in Progress in Green Computing(WIPGC), Chicago, IL (08/17/2010)
- Using GPUs to compute the multilevel summation of electrostatic forces, Multiscale Molecular Modeling Conference, Edinburgh, Scotland (07/02/2010)
- Molecular Visualization and Analysis on GPUs, Symposium on Application of GPUs in Chemistry and Materials Science, University of Pittsburgh (06/29/2010)
- Accelerating Biomolecular Modeling with CUDA and GPU Clusters, Accelerated Computing Conference, Tokyo, Japan (01/28/2010)
- An Introduction to OpenCL, GPUComputing.net Webinar (12/10/2009)
- Accelerating Molecular Modeling Applications with GPU Computing, Exhibition, Supercomputing 2009, Portland, OR (11/18/2009)
- OpenCL for Molecular Modeling Applications: Early Experiences, OpenCL BOF, Supercomputing 2009, Portland, OR (11/18/2009)
- An Introduction to OpenCL, IACAT/CCOE GPU Brown Bag Forum, University of Illinois (10/21/2009)
- High Performance Molecular Visualization and Analysis with GPU Computing, Beckman Institute Forum for Imaging and Visualization, University of Illinois (10/20/2009)
- Using GPU Computing to Accelerate Molecular Modeling Applications, CECAM Workshop, "Algorithmic Re-Engineering for Modern Non-Conventional Processing Units," CECAM-USI, Lugano, Switzerland (10/2/2009)
- GPU Accelerated Visualization and Analysis in VMD and Recent NAMD Developments, GPU Technology Conference, San Jose, CA (10/1/2009)
- Multilevel Summation of Electrostatic Potentials Using GPUs, Purdue University, (09/09/2009)
- Multidisciplinary Panel, VSCSE: Many-Core Processors for Science and Engineering Applications, NCSA (8/10/2008)
- GPU Accelerated Visualization and Analysis in VMD, Center for Molecular Modeling, University of Pennsylvania, (6/9/2009)
- Keynote: Accelerating Molecular Modeling Applications with GPU Computing, Second Sharcnet Symposium on GPU and Cell Computing, University of Waterloo (5/20/2009)
- Experiences with Multi-GPU Acceleration in VMD, Path to Petascale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters, NCSA (4/2/2009)
- Experience with NAMD on GPU-accelerated clusters, Path to Petascale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters, NCSA (4/2/2009)
- High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs, Second Workshop on General-Purpose Processing on Graphics Processing Units, Washington D.C. (3/8/2009)
- Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters, IACAT Accelerator Workshop, NCSA (1/23/2009)
- High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs, IACAT Accelerator Workshop, NCSA (1/23/2009)
- Adapting a Message-Driven Parallel Application to GPU-Accelerated Clusters, SC2008, Austin TX (11/18/2008)
- GPU Computing, Cape Linux Users Group, South Africa (10/28/2008)
- Accelerating Molecular Modeling Applications with Graphics Processors, Computer Science Department, University of Cape Town, South Africa (10/23/2008)
- Accelerating Computational Biology by 100x Using CUDA, NVISION 2008 (8/26/2008)
- GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications, ACM Computing Frontiers 2008 (5/7/2008)
- GPU Acceleration of Molecular Modeling Applications, Linux Clusters Institute Conference (5/1/2008)
- Accelerating Molecular Modeling Applications with Graphics Processors, SIAM PP08, Minisymposium 8: Revolutionary Technologies for Acceleration of Emerging Petascale Applications - GPUs (3/12/2008)
- Accelerating NAMD with Graphics Processors, SIAM PP08, Minisymposium 10: Current Developments in High-Performance Molecular Dynamics Simulations - Part II of II (3/12/2008)
- GPU Acceleration of Scientific Applications Using CUDA, AstroGPU 2007, Institute for Advanced Study, Princeton University (11/09/2007)
- Visualization of Nano-Scale Structures, University of Texas Health Science Center at Houston (4/20/2006)
- VMD: Algorithms and Methods for Large Scale Biomolecular Visualization, San Diego Supercomputer Center (9/12/2005)
Class lectures, workshop materials, and sample source code:
- High Performance Computing with CUDA Case Study: Heterogeneous GPU Computing for Molecular Modeling, CUDA Tutorial, Supercomputing 2010, New Orleans, LA (11/14/2010)
- Workshop on GPU Programming for Molecular Modeling, Beckman Institute, Urbana, IL (08/06/2010)
- The OpenCL Programming Model, Part 1 Illinois UPCRC Summer School (07/23/2010)
- The OpenCL Programming Model, Part 2 Illinois UPCRC Summer School (07/23/2010)
- Application Performance Case Studies: Molecular Visualization and Analysis (ECE 498 AL Guest Lecture) (4/8/2010)
- High Performance Computing with CUDA Case Study: Molecular Modeling Applications, CUDA Tutorial, Supercomputing 2009, Portland, OR (11/15/2009)
- Biomolecular Modeling Applications of GPUs and CPU-Accelerated Clusters, CUDA Tutorial, IEEE Cluster 2009 (9/4/2009)
- Case Study - Accelerating Molecular Dynamics Experimentation, VSCSE: Many-Core Processors for Science and Engineering Applications, NCSA (8/13/2008)
- Application Performance Case Studies: Molecular Visualization and Analysis (ECE 498 AL1 Guest Lecture) (4/7/2009, 4/9/2009)
- Intro: Using CUDA on Multiple GPUs Concurrently (IACAT Brown Bag Forum) (2/24/2009)
- GPU Computing Case Study: Molecular Modeling Applications (ECE 598 SP Guest Lecture) (11/11/2008)
- Case Study - Accelerating Molecular Dynamics Experimentation, Accelerators for Science and Engineering Applications: GPUs and Multicore, (8/21/2008)
- "Accelerating Scientific Applications with GPUs", Workshop on Programming Massively Parallel Processors (PMPP) (7/10/2008)
- Tutorial: High Performance Computing with CUDA (International Supercomputing Conference 2008) (6/16/2008)
- Tutorial: High Performance Computing on GPUs with CUDA (Supercomputing 2007)
- Performance Case Studies: Ion Placement Tool, VMD (ECE 498 AL1 Guest Lecture) (10/15/2007)
- Performance Case Studies: Ion Placement Tool, VMD (ECE 498 AL Guest Lecture) (3/14/2007)
- ECE 498 AL class home page
- Sponsored ECE 498 student projects
Investigators |
Collaborators
|
Our Research in the News
|





