From: John Stone (johns_at_ks.uiuc.edu)
Date: Mon Feb 16 2009 - 22:23:18 CST

Hi,
  Looks like you're running one of the GCC-based builds based on the
speedup. I've seen a GPU speedup as high as 700 when compared with
GCC, so that's the source of your problem. Try comparing against
the Intel C/C++ 9.0 test build posted toward the bottom of this page
(VMD alpha version 1.8.7a3):
  http://www.ks.uiuc.edu/Research/vmd/cuda/

Regarding the number of threads: VMD doesn't use MPI, so it's
likely completely ignoring your queueing system settings and
attempting to use all of the CPU cores it found on your
test machine. You can override this with the recent builds
of VMD by setting the environment variable VMDFORCECPUCOUNT to
1, 2, 4, etc.

Cheers,
  John Stone
  vmd_at_ks.uiuc.edu

On Mon, Feb 16, 2009 at 11:07:30PM -0500, Roman Petrenko wrote:
> sorry, the subject line meant to be "vmd with cuda1.1"
> runtime cpu 101489 sec
> runtime gpu 234 seconds
> still similar speedup 433.7
>
> could it be due to some threads interference?
> Using 4 CPUs
> thread 0 started...
> thread 3 started...
> thread 2 started...
> thread 1 started...
>
> why is it using 4 threads? i submitted a job with -lnodes=1:ppn=1 option.
>
> i'll update later on what compiler was used.
>
> On Mon, Feb 16, 2009 at 10:56 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
> >
> > Hi,
> > The GFLOPS numbers indicate the number of floating point
> > calculations, but not the runtime. The speedup is based on the
> > different in runtime not on GFLOPS. There are various differences
> > in the algorithm used on the GPU vs. the algorithm used on the CPU,
> > making the CPU more efficient in GFLOPS/runtime than the GPU is.
> > I think if you divide the runtimes out, you'll get a speedup closer
> > to the range we see, assuming you run a build done with the Intel
> > compiler and not just GCC. GCC doesn't generate very good code for
> > this particular kernel, and just using the Intel compilers will
> > improve the CPU performance by a large factor vs. GCC.
> > Let me know if you have further questions.
> >
> > Cheers,
> > John Stone
> > vmd_at_ks.uiuc.edu
> >
> > On Mon, Feb 16, 2009 at 10:47:16PM -0500, Roman Petrenko wrote:
> >> Hi all,
> >> i ran time-averaged coulomb potential evaluations (downloaded from vmd
> >> cuda website) and got more than 500 speedup on nvidia 9800 gpu vs cpu.
> >> gpu speed 265.45 GFLOPS
> >> cpu(4 threads) speed 0.497122 GFLOPS
> >>
> >> how is it possible? According to John Stone presentations the speedup
> >> is expected to be at around 30 or 100.
> >
> > --
> > NIH Resource for Macromolecular Modeling and Bioinformatics
> > Beckman Institute for Advanced Science and Technology
> > University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> > Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
> > WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
> >

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078