Computational Molecular Science and Engineering

Computational Molecular Science and Engineering - Key 21st Century Technology

In this essay we will first explain the great opportunities of Molecular Science and Engineering, describe then efforts at leading institutions, and spell out what this field needs to achieve.

Molecular science and engineering, naturally, is closely related to chemistry, the field that is mainly involved in the frontier of new synthetic routes and new characterization methods of the molecular building blocks in materials engineered for specific technical uses or evolved into the wonderful machines of the living world. The field is also entrenched in physics that furnishes the underlying principles and quantitative descriptions and develops entirely new measurement techniques such as single molecule detection and manipulation methods. Synthetic chemists have discovered synthetic routes to dendrimers, nanotubes, conducting polymers, and self-assembling molecules while physicists along with physical chemists have developed single molecule spectroscopy, atomic force microscopy, optical tweezers, and scanning tunneling microscopes, techniques that inspect and manipulate molecules at unprecedented resolutions. Chemists and physicists contributed to a revolutionary advance of crystallography, NMR spectroscopy, electron cryo-microscopy, and super-resolution light microscopy that have provided a treasure of information on the atomic and supra-atomic level structures of molecular systems, a cherished gift for many human generations. Presently, 41,000 biomolecular structures are known and freely available to anybody, yet this treasure is mostly untapped; complementing these biological structures are millions of sequences from genomic sequencing efforts supported by several funding agencies.

The world of molecules used to be far removed from the world of technological devices. The device engineer consulted the chemist and the physicist for the materials employed, but for a long time many engineers were not much aware of the materials’ molecular building blocks, their focus being on a size scale a billion times larger. But as we know, engineered devices shrank rapidly over the last fifty years and now have reached the ultimate limit of the molecule and the atom. This has happened in many engineering disciplines, magnetic information storage in electronics being a typical example. Engineers fabricate also electronic nanodevices in the latest chip technology and nanoanalytical devices in sensing, that are both molecular scale. In short, the device engineering reached the molecular level and future technology "wars" will be fought at a molecular "frontier". Incidentally, there is no further frontier, so we’d be better prepared technologically for the molecular world.

Indeed, also all living systems are manifestly organized around their molecular scale. A reminder of this fact for all of us is the most common medical intervention, the pill we take against our ailments; a bunch of molecules dissolved in our stomach help against apparently complex diseases, even mental disorders. The human body and any biological organism is a society of molecules. Our century has been called with good reason the century of life science and bioengineering, but in as much as this is true, our century is then also the century of molecular science and engineering. For example, if one wishes to design organisms, e.g., algae, to photosynthetically produce hydrogen gas cheaply and abundantly, a known route is to redesign a protein in the algae to make the protein function under normal environmental conditions by rendering it tolerant against oxygen. While the bioengineer may never use chemical, only genetic tools, he or she needs to know chemistry well to retool an organism. Indeed, the leading scientific journals and popular press report almost every day of the triumphs of molecular biology and molecular medicine, be it at the occasion of the yearly Nobel prizes or in a stock market report of a biotech company.

The Computational Microscope

The scientist and engineer working at the molecular level needs to see the world of molecules where the typical length scales measure 1-1000 nm and processes most often occur on ns - ms time scales. Many experimental techniques cover this world and have lead to discoveries and insight, but all techniques cover the molecular world only in very limited ways, offering still pictures, but not movies, resolving atoms but not large systems, seeing fast or very slow processes. There exists only one "microscope" that sees electrons, atoms, molecules, and supramolecular assemblies, that shows still pictures and movies, and that microscope is computational modeling. This microscope is not perfect, but every microscopy has imaging artifacts; pictures and movies offered by the computational microscope are clearly much better than no image, and in fact, computational microscopy has had many successes.

Those who used it already in cell biology, medicine, and bioengineering do not want to live without it. Computational biology has seen a huge growth, with user numbers of available programs going into the hundred thousand. In fact, the UIUC programs VMD and NAMD for biomolecular modeling have together hundred thousand registered users; most if not all researchers in molecular science and engineering spend today a significant amount of time employing the computational tools of their field, the computer being their eyes.

Computational Molecular Science and Engineering at Various Institutions

Naturally, the center stage role of computing in molecular science and engineering has been noticed widely. Centers, often closely integrating computational, experimental, and engineering efforts have sprung up at many institutions, in particular at so-called upcoming institutions that position themselves in new fields rather than competing in established fields. Examples are centers for Computational Molecular Science at Georgia Tech., at the U. of Queensland, Australia, and at Helsinki Technical U., the Center for Molecular Modeling at the U. of Pennsylvania, the Institute for Molecular Science in Okazaki, Japan, the UK Computational Molecular Science and Modelling Centre, and departments in various German Max Planck Institutes, for example the ones for Material Science in Stuttgart, Polymer Science in Mainz, and Biophysical Chemistry in Goettingen. Lastly, impressive computational molecular science institutes have sprung up in China.

In the US computational molecular science is thriving, it being represented through faculty members at basically all leading universities. However, the research groups are mostly isolated, often spread without links across several departments (chemistry, biochemistry, chemical engineering, physics, material science, and medical departments). This has worked as long as molecular science and its computational branch were small. Some fields of molecular science, in particular, computational molecular science, have changed their character now requiring large teams and large instrumentation for successful pursuit. The codes behind modern molecular science computer programs have millions of instructions (lines) and computing resources require multimillion-dollar investments in electronics, physical plants, and man power. For a while the needed human and material resources could be achieved through national centers not specialized in molecular science and through smaller scale funding opportunities, but now leading edge molecular science requires its own major resources, in particular to develop the needed computer codes. Naturally, such resources can be deployed more effectively in centers than individual research groups.

As an example we describe two industrial efforts, one by a major investment firm, D.E. Shaw, New York, and one by IBM. Both companies believe in huge opportunities in molecular science, in particular biomolecular science and have 30 people strong teams of leading specialists in computational biology that develop codes for petascale computers. Both companies invest into huge computational resources, D.E. Shaw is said to have acquired its own power plant to power their future machines, IBM bets the farm on developing the Blue Gene family of machines that are clearly geared for molecular science applications. Naturally, the likes of D.E. Shaw and IBM in the US, the Max Planck Society in Europe, the upcoming petascale Japanese computer specifically designed for molecular life science, and quickly rising centers in China are sold on the future of computational molecular science. To compete successfully, one needs to compete on their terms, and that is through well-equipped centers rather than single groups.

Goals that Should be Achieved in Computational Science and Engineering

Enable scientific discovery and revolutionary engineering solutions. Before one states any goal one must realize that the only acceptable condition for computational methods at a large scale is complete orientation towards experimental and engineering applications. While computing needs periods of unimpeded development and its own culture and infrastructure, the ultimate goal is to contribute to real scientific discovery and engineered solutions. This is the only eventual yardstick, every other measure would imply a waste of valuable resources: computational scientists must strive for Nobel prize level science and engineering breakthroughs starting successful companies, nothing less.

Integrate and combine modeling expertise. The interlocking scales governing molecular systems (the electronic scale requiring quantum computing, the atomic scale requiring classical molecular dynamics, the molecular scale requiring coarse-graining, the macromolecular scale requiring continuum theories, the device scale requiring various device theories, and the solvent requiring fluid dynamics and continuum electrostatics) require huge, diverse, yet tightly integrated computer codes. The combination of the multiple scales so far has been achieved usually only pairing methodologies with adjacent scales, e.g., quantum chemistry and classical mechanics, or classical mechanics and coarse-grained mechanics, and rarely has been automated enough to constitute an approach open to researchers other than the developers themselves. Research on the molecular origins of life, an emerging area of computational molecular science well presented at UIUC, also requires the integration of codes, namely from evolution, quantum chemistry, and classical mechanics.

Capitalize on supercomputing. The best chance to gain ground in multi-scale computational molecular science is the use of multi-core and multi-processor machines. While such use, until recently, meant often involvement of just a few or a few tens of processors, today it involves hundreds to thousands of processors, and soon will involve ten thousand to hundred thousand processors. The codes for such uses can only be developed by teams of computer and computational scientists and the work is never done as programs and machines continuously evolve. The main challenge will be to merge different algorithmic tasks in a parallel program and to balance the work load in order to utilize all processors of a machine effectively. This requires close cooperation between computer scientists and computational scientists and engineers; the latter need to "think parallel" and think in terms of joint solutions.

Utilize emerging hardware. A second technical route to huge computer power as needed in molecular science are graphics processing units (GPUs). GPUs added to central processing units (CPUs) are widely employed which offered an economy of scale to their development. Until recently, the computer power of GPUs could not be tapped since it was designed for narrow graphics purposes and since GPUs were hard to program, but this is changing. In fact, 2007 is a landmark year for the programming of GPUs with a major manufacturer (nVidia) offering compilers for a C-like language in single and double precision. It is estimated that a GPU completes molecular modeling calculations ten times faster than the fastest present CPUs. Molecular scientists and engineers can take advantage of the GPU option only when they work closely with computer scientists and electrical engineers.

Automate modeling tasks through cyberinfrastructure computing. Computational molecular science becomes today more and more similar to the experimental and engineering sides of the field, scanning many related systems or sampling, in a statistical mechanical sense, many identical systems. This confronts researchers and engineers with the need to run many independent calculations, each one being challenging in terms of set-up, monitoring, and analysis. Fortunately, computational technologies exist to automate the tasks described and making thereby large scale scanning and sampling possible. First, extremely promising steps were taken leading to so-called cyberinfrastructure software that is already widely distributed. Extending this software further will open entire new opportunities in molecular science and engineering.

Develop multi-modal approaches to molecular simulations. As stated above, experimental methodologies don't cover size and time scale ranges that molecular science and engineering requires; they also offer only glimpses of molecular scale properties. For example, crystallography offers atomic resolution static views of single molecules, while electron microscopy offers lower resolution views of molecular assemblies. Computing is ideally suited, and in fact already being used by leading molecular scientists, to derive from the multi-modal data, crystallography and electron microscopy, a unified structural model. Multi-modal measurements that can benefit are common in molecular science and engineering, the respective data stemming also from NMR spectroscopy, optical spectroscopy, atomic force microscopy, among others. The latter methods measure specific properties, e.g., at which force a molecule breaks apart, but they do not show how the molecule falls apart. Here is another opportunity for computing, to reproduce measurement conditions, calculate the measured properties and check if they agree with observation, and when successful offer the full picture of the molecular system.

What Steps Should be Taken to Achieve Great Impact

The goals formulated above are actually well suited as a strategy for the field. Opportunities definitely exist to successfully develop the vision outlined above, but will require the collaboration of first-rate computational molecular scientists and engineers across many institutions, such as those mentioned above, to accomplish the stated goals.

Consolidate software development efforts. The biomedical community already has notable successes with computational science software. One example is VMD, a hugely popular molecular graphics, structure building and analysis, and gene sequence analysis software. A second example is NAMD, an extremely widely employed molecular modeling software consuming a significant part of US academic computing resources in the hands of thousands of users. A third more recent example is NAMD-G, a new cyberinfrastructure-based NAMD extension for automating the farming out of closely related computation. A fourth example is BioCoRE, a collaboration and communication tool for molecular scientists and others. Further software offerings are BioMocca, an electrostatic solver that can be used for solvent mediated ion transport with complex slid - solvent interfaces, and lastly software stemming from a major new initiative that integrates the popular and freely available GAMESS quantum chemistry program with molecular dynamics. The software codes mentioned and possibly other software of which we are not aware, offer a unique and convincing opportunity for a development of a combined computational molecular science and engineering approach that will offer computational molecular science a promising pole position in the race for leadership in the field among stiff competition. We emphasize here the opportunity that lies in extending a classical molecular dynamics based approach (VMD, NAMD, NAMD-G, BioMocca) with a quantum chemical approach (e.g., built on GAMESS and built on Quantum Monte Carlo approaches); welcome would be also classical Monte Carlo methodologies. Key to progress will be to automate largely the respective calculations to make them accessible beyond leading experts.

Push for industry-grade implementation of the software. While developing and distributing software is a powerful means to gain leadership, it is by its nature a difficult and unusual proposition for conventional academic research. A great fraction of effort goes into optimizing, upkeep, porting, and solidifying existing solutions without generating new capabilities. Such effort, while hugely successful in the end, requires a tremendous degree of patience and altruism in the interim, that is neither well rewarded nor in an early phase well funded at most universities and research centers. To sustain such efforts a new culture of combining pure and applied scientific and technical pursuits must be built, a reward system, notably also for staff at participating institutions, must be put into place, and a rather comprehensive interim funding made available at institutions pursuing the aforementioned goals.

Organize conferences/workshops. The research theme should measure itself along other national and international programs through a regular conference and/or workshops. Such conferences offer colleagues in molecular science and engineering an opportunity to discuss how best to achieve the stated goals, for example one can easily imagine a conference with sessions devoted to exploring new hardware developments, new methods in coarse graining, refinements in computer codes, and so on.

Develop core curriculum in computational molecular sciences. Computational molecular science and engineering poses another fundamental challenge to conventional academic pursuits, namely one at the very core of the academic effort, teaching. The question is simply how one can teach the next generation as well as the present generation of scientists, engineers, and other professionals in the discipline. The problem is that teaching of computational methods needs to be done through training by doing, rather than through conventional classroom lectures alone. While the latter are needed to teach the concepts and algorithms involved in molecular science computing, computing as a research methodology needs to be taught in a practical manner. This requires a careful analysis of training goals and a huge investment in the development of tutorials, case studies, and training software, with cooperation and coordination across participating institutions.

Home

Overview

Publications

Research

Software

Outreach

Seminars

Lectures and Talks

Training

Tutorials

Case Studies

Workshops