From: John Stone (johns_at_ks.uiuc.edu)
Date: Mon Feb 16 2009 - 21:56:04 CST

Hi,
  The GFLOPS numbers indicate the number of floating point
calculations, but not the runtime. The speedup is based on the
different in runtime not on GFLOPS. There are various differences
in the algorithm used on the GPU vs. the algorithm used on the CPU,
making the CPU more efficient in GFLOPS/runtime than the GPU is.
I think if you divide the runtimes out, you'll get a speedup closer
to the range we see, assuming you run a build done with the Intel
compiler and not just GCC. GCC doesn't generate very good code for
this particular kernel, and just using the Intel compilers will
improve the CPU performance by a large factor vs. GCC.
Let me know if you have further questions.

Cheers,
  John Stone
  vmd_at_ks.uiuc.edu

On Mon, Feb 16, 2009 at 10:47:16PM -0500, Roman Petrenko wrote:
> Hi all,
> i ran time-averaged coulomb potential evaluations (downloaded from vmd
> cuda website) and got more than 500 speedup on nvidia 9800 gpu vs cpu.
> gpu speed 265.45 GFLOPS
> cpu(4 threads) speed 0.497122 GFLOPS
>
> how is it possible? According to John Stone presentations the speedup
> is expected to be at around 30 or 100.

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078