Andronico, Alessio; Randall, Arlo; Benz, Ryan W.; Baldi, Pierre
Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 51:760-776, APR 2011

Accurate prediction of the 3-D structure of small molecules is essential in order to understand their physical, chemical, and biological properties, including how they interact with other molecules. Here, we survey the field of high-throughput methods for 3-D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system's performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to state-of-the-art prediction methods, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD (99.6% organic, 94.6% metal-organic), whereas the widely used commercial method CORINA predicts structures for 68.5% (98.5% organic, 51.6% metal-organic). On the common subset of molecules predicted by both methods, COSMOS makes predictions with an average speed per molecule of 0.15 s (0.10 s organic, 0.21 s metal-organic) and an average rmsd of 1.57 angstrom (1.26 angstrom organic, 1.90 angstrom metal-organic), and CORINA makes predictions with an average speed per molecule of 0.13s (0.18s organic, 0.08s metal-organic) and an average rmsd of 1.60 angstrom (1.13 angstrom organic, 2.11 angstrom metal-organic). COSMOS is available through the ChemDB chemoinformatics Web portal at http://cdb.ics.uci.edu/.

DOI:10.1021/ci100223t

Find full text with Google Scholar.