Re: tesla 2050 benchmark

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Jul 12 2011 - 14:04:00 CDT

On Tue, Jul 12, 2011 at 2:56 PM, Burgess, Don E <deburgess_at_uky.edu> wrote:
> Why is the optimal number of processes limited to the number of cpu cores, when I run a job on a gpu tesla 2050?

because there is still a significant chunk of work being done on the CPU.

> I invoke the job with the command:
>
> /home/deburg0/Downloads/NAMD_2.8_Linux-x86_64-CUDA/charmrun ++local +p8 /home/deburg0/Downloads/NAMD_2.8_Linux-x86_64-CUDA/namd2 +idlepoll +devices 0,1 kcsa_T85A_popcwieq-09.conf > kcsa_T85A_popcwieq-09.log
>
> When I try +pN where N>8, my performance gets worse.

with 2 GPU devices and 8 CPU tasks, you have already reached
the point where there is no more gain from oversubscribing the
the GPUs. so why should there be additional speedup?
if you run a CPU only version, you would not expect a speedup
from oversubscribing the CPUs, or do you?

cheers,
   axel.

>
> Please refer to the attached log file.
>
> thank you very much for your help.
>
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:24:11 CST