Re: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Mon Jun 06 2011 - 05:50:24 CDT

On Mon, Jun 6, 2011 at 5:51 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Hi Francesco,
>
> As your output shows, both gtx cards were in use.
>
> Pe 1 sharing CUDA device 1 -> This is gtx 1
> Pe 0 sharing CUDA device 0 -> This is gtx 2
>
> The driver you get from nvidia and from your os is the same I think. The nvidia driver must be compiled for your platform, the os driver already is.
>
> If more gpus bring better performance regards heavily on your hardware and system size. Just try if 6 cpus sharing on gtx is slower or the same as 6cpus sharing 2 gtx cards. I think the oversubscription of such a gtx is limited very quick and u should get better performance while using both the cards.

of course, oversubscribing GPUs can only help up to a point. it doesn't
create more GPUs, it only allows you to use it more efficiently. think of
it like hyperthreading. that also it a trick to improve utilization of
the different
units on the CPU, but it cannot replace a full processor core and its efficiency
is limited to how much the different units of the CPU are occupied and
by the available memory bandwidth.

>Not so if using a Tesla C2050. This card can be shared by more than 6 cores without running into a bottleneck if plugged into a pcie 2.0 x16 slot.

this is nonsense. as far as the CUDA code in NAMD is concerned there is not
much of a difference between a Tesla and a GeForce card. In fact the high-end
GeForce cards are often faster due to having higher memory and processor clocks.
there is, however, the bottleneck of having sufficient PCI-e bus
bandwidth available,
but that affects both type of cards.

axel.

> Best regards.
>
> Norman Geist.
>
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Francesco Pietra
> Gesendet: Montag, 6. Juni 2011 11:16
> An: NAMD
> Betreff: Fwd: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance
>
> I forgot to show the output log:
>
> Charm++> scheduler running in netpoll mode.
> Charm++> Running on 1 unique compute nodes (6-way SMP).
> Charm++> cpu topology info is gathered in 0.000 seconds.
> Info: NAMD CVS-2011-06-04 for Linux-x86_64-CUDA
>
> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.00653386 s
> Pe 3 sharing CUDA device 1 first 1 next 5
> Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Pe 1 sharing CUDA device 1 first 1 next 3
> Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Pe 5 sharing CUDA device 1 first 1 next 1
> Did not find +devices i,j,k,... argument, using all
> Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Pe 0 sharing CUDA device 0 first 0 next 2
> Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Pe 2 sharing CUDA device 0 first 0 next 4
> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Pe 4 sharing CUDA device 0 first 0 next 0
> Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
> 470'  Mem: 1279MB  Rev: 2.0
> Info: 1.64104 MB of memory in use based on CmiMemoryUsage
>
>
>
>
> ---------- Forwarded message ----------
> From: Francesco Pietra <chiendarret_at_gmail.com>
> Date: Mon, Jun 6, 2011 at 9:54 AM
> Subject: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance
> To: NAMD <namd-l_at_ks.uiuc.edu>
>
>
> Hello:
>
> I have assembled a gaming machine with
>
> Gigabyte GA890FXA-UD5
> AMD PhenomII 1075T (3.0 GHz)
> 2xGTX-470
> AMD Edition 1280MB GDDRV DX11 DUAL DVI / MINI HDMI SLI ATX
> 2x 1TB HD software RAID1
> 16 GB RAM DDR3 1600 MHz
> Debian amd64 whyzee
> NAMD_CVS-2011-06-04_Linux-x86_64-CUDA.tar.gz
> No X server (ssh to machines with X server)
>
> In my .bashrc:
>
> NAMD_HOME=/usr/local/namd-cuda_4Jun2010nb
> PATH=$PATH:$NAMD_HOME/bin/namd2; export NAMD_HOME PATH
> PATH="/usr/local/namd-cuda_4Jun2010nb/bin:$PATH"; export PATH
>
> if [ "LD_LIBRARY_PATH" ]; then
>    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/namd-cuda_4Jun2010nb
> else
>    export LD_LIBRARY_PATH="/usr/local/namd-cuda_4Jun2010nb"
>
>
> I lauched a RAMD rrn on a >200,000-atoms system with
>
> charmrun $NAMD_HOME/bin/namd2 ++local +p6 +idepoll  ++verbose
> filename.conf 2>&1 | tee filename.log
>
> It runs fine, approximately (by judging from "The last velocity output
> at each ten-steps writing) ten times faster than a 8-CPU shared-mem
> machine with dual-opteron 2.2 GHz.
>
> I did nothing as to indicating the GTX-470 to use. Can both be used?
> Is that the same (in terms of performance) using the nvidia-provided
> cuda driver or the one available with the OS (Debian)?. Sorry for the
> last two naive questions, perhaps resulting from the stress of the
> enterprise. I assume that "nvidia-smi" is of no use for these graphic
> cards.
>
> Thanks a lot for advice
>
> francesco pietra
>
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:14 CST