NAMD CUDA performance degradation

From: Thomas C. Bishop (bishop_at_tulane.edu)
Date: Thu Jul 23 2009 - 14:31:10 CDT

Ok so I downloaded NAMD/FFT/TCl/CHARM and did the compile thing for our GPU
system (2x Tesla C1060 that do show up on namd output as binding)

The upshot: compiling as per instructions is a no brainer (if I can do it...)

The downside: CUDA runs slower than w/out (hmm.. maybe I shouldn't be one
doing this )

Below are benchmarks lines from the two systems.
the hardware is 16core/NODE w/ 2GPUs. I'm running on 1 NODE to avoid network
issues.

THE NAMD LOGS HAVE THESE MESSAGES:

Bad result from CmiGetPesOnPhysicalNode!
pe 10 physnode rank 0 of 1 is 0

AND

Charm warning> Randomization of stack pointer is turned on in Kernel,
run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it.
Thread migration may not work!

Suggestions?

Tom

SYTEM 1: 134335 ATOMS protein-DNA complex
***************************

cuda.16.out:Info: Benchmark time: 16 CPUs 0.23318 s/step 1.34942 days/ns
76.3906 MB memory
cuda.16.out:Info: Benchmark time: 16 CPUs 0.233269 s/step 1.34994 days/ns
76.4316 MB memory

nocuda.16.out:Info: Benchmark time: 16 CPUs 0.213155 s/step 1.23353 days/ns
82.3376 MB memory
nocuda.16.out:Info: Benchmark time: 16 CPUs 0.213293 s/step 1.23433 days/ns
82.5941 MB memory

SYSTEM 2: the APO system ~100,000 atoms
*******************
apo.cuda.16.out:Info: Benchmark time: 16 CPUs 0.178929 s/step 2.07094 days/ns
55.6983 MB memory
apo.cuda.16.out:Info: Benchmark time: 16 CPUs 0.178775 s/step 2.06915 days/ns
55.7847 MB memory

apo.nocuda.16.out:Info: Benchmark time: 16 CPUs 0.171169 s/step 1.98112
days/ns 61.6952 MB memory
apo.nocuda.16.out:Info: Benchmark time: 16 CPUs 0.17114 s/step 1.98078 days/ns
61.3251 MB memory

-- 
**********************
Thomas C. Bishop     *
Office: 504-862-3370 *
Fax:    504-862-8392 *
**********************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:04 CST