From: Roman Petrenko (rpetrenko_at_gmail.com)
Date: Mon Feb 16 2009 - 22:25:44 CST

Thanks, that was very helpful.

On Mon, Feb 16, 2009 at 11:23 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
>
> Hi,
> Looks like you're running one of the GCC-based builds based on the
> speedup. I've seen a GPU speedup as high as 700 when compared with
> GCC, so that's the source of your problem. Try comparing against
> the Intel C/C++ 9.0 test build posted toward the bottom of this page
> (VMD alpha version 1.8.7a3):
> http://www.ks.uiuc.edu/Research/vmd/cuda/
>
> Regarding the number of threads: VMD doesn't use MPI, so it's
> likely completely ignoring your queueing system settings and
> attempting to use all of the CPU cores it found on your
> test machine. You can override this with the recent builds
> of VMD by setting the environment variable VMDFORCECPUCOUNT to
> 1, 2, 4, etc.
>
> Cheers,
> John Stone
> vmd_at_ks.uiuc.edu
>
>
> On Mon, Feb 16, 2009 at 11:07:30PM -0500, Roman Petrenko wrote:
>> sorry, the subject line meant to be "vmd with cuda1.1"
>> runtime cpu 101489 sec
>> runtime gpu 234 seconds
>> still similar speedup 433.7
>>
>> could it be due to some threads interference?
>> Using 4 CPUs
>> thread 0 started...
>> thread 3 started...
>> thread 2 started...
>> thread 1 started...
>>
>> why is it using 4 threads? i submitted a job with -lnodes=1:ppn=1 option.
>>
>> i'll update later on what compiler was used.
>>
>> On Mon, Feb 16, 2009 at 10:56 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
>> >
>> > Hi,
>> > The GFLOPS numbers indicate the number of floating point
>> > calculations, but not the runtime. The speedup is based on the
>> > different in runtime not on GFLOPS. There are various differences
>> > in the algorithm used on the GPU vs. the algorithm used on the CPU,
>> > making the CPU more efficient in GFLOPS/runtime than the GPU is.
>> > I think if you divide the runtimes out, you'll get a speedup closer
>> > to the range we see, assuming you run a build done with the Intel
>> > compiler and not just GCC. GCC doesn't generate very good code for
>> > this particular kernel, and just using the Intel compilers will
>> > improve the CPU performance by a large factor vs. GCC.
>> > Let me know if you have further questions.
>> >
>> > Cheers,
>> > John Stone
>> > vmd_at_ks.uiuc.edu
>> >
>> > On Mon, Feb 16, 2009 at 10:47:16PM -0500, Roman Petrenko wrote:
>> >> Hi all,
>> >> i ran time-averaged coulomb potential evaluations (downloaded from vmd
>> >> cuda website) and got more than 500 speedup on nvidia 9800 gpu vs cpu.
>> >> gpu speed 265.45 GFLOPS
>> >> cpu(4 threads) speed 0.497122 GFLOPS
>> >>
>> >> how is it possible? According to John Stone presentations the speedup
>> >> is expected to be at around 30 or 100.
>> >
>> > --
>> > NIH Resource for Macromolecular Modeling and Bioinformatics
>> > Beckman Institute for Advanced Science and Technology
>> > University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
>> > Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
>> > WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
>> >
>
> --
> NIH Resource for Macromolecular Modeling and Bioinformatics
> Beckman Institute for Advanced Science and Technology
> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
> WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
>