AW: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Jun 06 2011 - 04:51:31 CDT

Next message: Axel Kohlmeyer: "Re: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
Previous message: Francesco Pietra: "Fwd: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
In reply to: Francesco Pietra: "Fwd: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Francesco,

As your output shows, both gtx cards were in use.

Pe 1 sharing CUDA device 1 -> This is gtx 1
Pe 0 sharing CUDA device 0 -> This is gtx 2

The driver you get from nvidia and from your os is the same I think. The nvidia driver must be compiled for your platform, the os driver already is.

If more gpus bring better performance regards heavily on your hardware and system size. Just try if 6 cpus sharing on gtx is slower or the same as 6cpus sharing 2 gtx cards. I think the oversubscription of such a gtx is limited very quick and u should get better performance while using both the cards. Not so if using a Tesla C2050. This card can be shared by more than 6 cores without running into a bottleneck if plugged into a pcie 2.0 x16 slot.

Best regards.

Norman Geist.

-----Ursprüngliche Nachricht-----
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Francesco Pietra
Gesendet: Montag, 6. Juni 2011 11:16
An: NAMD
Betreff: Fwd: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance

I forgot to show the output log:

Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (6-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Info: NAMD CVS-2011-06-04 for Linux-x86_64-CUDA

Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00653386 s
Pe 3 sharing CUDA device 1 first 1 next 5
Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 5 sharing CUDA device 1 first 1 next 1
Did not find +devices i,j,k,... argument, using all
Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 2 sharing CUDA device 0 first 0 next 4
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 4 sharing CUDA device 0 first 0 next 0
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Info: 1.64104 MB of memory in use based on CmiMemoryUsage

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Mon, Jun 6, 2011 at 9:54 AM
Subject: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance
To: NAMD <namd-l_at_ks.uiuc.edu>

Hello:

I have assembled a gaming machine with

Gigabyte GA890FXA-UD5
AMD PhenomII 1075T (3.0 GHz)
2xGTX-470
AMD Edition 1280MB GDDRV DX11 DUAL DVI / MINI HDMI SLI ATX
2x 1TB HD software RAID1
16 GB RAM DDR3 1600 MHz
Debian amd64 whyzee
NAMD_CVS-2011-06-04_Linux-x86_64-CUDA.tar.gz
No X server (ssh to machines with X server)

In my .bashrc:

NAMD_HOME=/usr/local/namd-cuda_4Jun2010nb
PATH=$PATH:$NAMD_HOME/bin/namd2; export NAMD_HOME PATH
PATH="/usr/local/namd-cuda_4Jun2010nb/bin:$PATH"; export PATH

if [ "LD_LIBRARY_PATH" ]; then
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/namd-cuda_4Jun2010nb
else
export LD_LIBRARY_PATH="/usr/local/namd-cuda_4Jun2010nb"

I lauched a RAMD rrn on a >200,000-atoms system with

charmrun $NAMD_HOME/bin/namd2 ++local +p6 +idepoll ++verbose
filename.conf 2>&1 | tee filename.log

It runs fine, approximately (by judging from "The last velocity output
at each ten-steps writing) ten times faster than a 8-CPU shared-mem
machine with dual-opteron 2.2 GHz.

I did nothing as to indicating the GTX-470 to use. Can both be used?
Is that the same (in terms of performance) using the nvidia-provided
cuda driver or the one available with the OS (Debian)?. Sorry for the
last two naive questions, perhaps resulting from the stress of the
enterprise. I assume that "nvidia-smi" is of no use for these graphic
cards.

Thanks a lot for advice

francesco pietra

Next message: Axel Kohlmeyer: "Re: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
Previous message: Francesco Pietra: "Fwd: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
In reply to: Francesco Pietra: "Fwd: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:23 CST