From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Mon Jun 06 2011 - 04:30:46 CDT
I also forgot to show that there is no 10-times speed up, only about
3-times, as far as I can understand:
On the 8-CPUs machine:
TIMING: 1600 CPU: 1119.3, 0.64616/step Wall: 1204.51, 0.706384/step,
97.795 hours remaining, 1063.727432 MB of memory in use.
On the CUDA machine:
TIMING: 1600 CPU: 383.972, 0.240135/step Wall: 386.054,
0.24097/step, 33.3609 hours remaining, 200.612610 MB of memory in use.
Hope the CUDA machine can be better exploited. At this point is even
not clear to me if the GTX-470 are being used at all.
francesco
---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Mon, Jun 6, 2011 at 11:15 AM
Subject: Fwd: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance
To: NAMD <namd-l_at_ks.uiuc.edu>
I forgot to show the output log:
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (6-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
Info: NAMD CVS-2011-06-04 for Linux-x86_64-CUDA
Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD CVS-2011-06-04 Linux-x86_64-CUDA 6 gig64 francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00653386 s
Pe 3 sharing CUDA device 1 first 1 next 5
Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 5 sharing CUDA device 1 first 1 next 1
Did not find +devices i,j,k,... argument, using all
Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 2 sharing CUDA device 0 first 0 next 4
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Pe 4 sharing CUDA device 0 first 0 next 0
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
470' Mem: 1279MB Rev: 2.0
Info: 1.64104 MB of memory in use based on CmiMemoryUsage
---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Mon, Jun 6, 2011 at 9:54 AM
Subject: namd-l: AMD-PhenomII-1075_GTX470 NAMD-CUDA performance
To: NAMD <namd-l_at_ks.uiuc.edu>
Hello:
I have assembled a gaming machine with
Gigabyte GA890FXA-UD5
AMD PhenomII 1075T (3.0 GHz)
2xGTX-470
AMD Edition 1280MB GDDRV DX11 DUAL DVI / MINI HDMI SLI ATX
2x 1TB HD software RAID1
16 GB RAM DDR3 1600 MHz
Debian amd64 whyzee
NAMD_CVS-2011-06-04_Linux-x86_64-CUDA.tar.gz
No X server (ssh to machines with X server)
In my .bashrc:
NAMD_HOME=/usr/local/namd-cuda_4Jun2010nb
PATH=$PATH:$NAMD_HOME/bin/namd2; export NAMD_HOME PATH
PATH="/usr/local/namd-cuda_4Jun2010nb/bin:$PATH"; export PATH
if [ "LD_LIBRARY_PATH" ]; then
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/namd-cuda_4Jun2010nb
else
export LD_LIBRARY_PATH="/usr/local/namd-cuda_4Jun2010nb"
I lauched a RAMD rrn on a >200,000-atoms system with
charmrun $NAMD_HOME/bin/namd2 ++local +p6 +idepoll ++verbose
filename.conf 2>&1 | tee filename.log
It runs fine, approximately (by judging from "The last velocity output
at each ten-steps writing) ten times faster than a 8-CPU shared-mem
machine with dual-opteron 2.2 GHz.
I did nothing as to indicating the GTX-470 to use. Can both be used?
Is that the same (in terms of performance) using the nvidia-provided
cuda driver or the one available with the OS (Debian)?. Sorry for the
last two naive questions, perhaps resulting from the stress of the
enterprise. I assume that "nvidia-smi" is of no use for these graphic
cards.
Thanks a lot for advice
francesco pietra
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:23 CST