namd-l: Re: performance question

No, usually the utilisation is higher, but this doesn’t matter if the speedup is satisfying.

So you should benchmark for various node counts and have a look on the speedup, relative

to one node.

I’ll give some hints on what to try:

(forget about +ppn +pemap +commap for now)

1. Do not pass +devices, better pass +ignoresharing

2. Try adding “twoawayx yes” in your namd script

On improvement try adding twoawayy.

On improvement try adding twoawayz.

Norman Geist.

From: owner-namd-l@ks.uiuc.edu [mailto:owner-namd-l@ks.uiuc.edu] On Behalf Of Thomas C. Bishop
Sent: Monday, April 27, 2015 10:54 PM
To: namd-l@ks.uiuc.edu
Subject: namd-l: performance question

Dear NAMD,

Is it typical to have ~30% CPU usage (reported by say uptime/top) and ~20% GPU usage
reported by nvidia-smi for NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/ runs ?

I'm used to seeing the CPUs pegged at 100% for non-gpu runs.

Any suggestions/feedback greatly appreciated.
TOm

Details
*****************
I have a system w/ 266038 atoms
and I"m trying to optimize the run time performance on 200 core (10 nodes) of a machine where each node has

Two 10-core 2.8 GHz E5-2680v2 Xeon processors
Two NVIDIA Tesla K20x GPU's
56 Gb/sec (FDR) InfiniBand 2:1 oversubscribed mesh)

I get the best performance when I leave one or two cores per node for communication

~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA//charmrun ++p 180 ++ppn 18 ++nodelist $nodefile ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 +pemap 0-8,10-18 +commap 9,19 +devices 0,1,0,1 dyn10.conf

OR more simply

~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA///charmrun ++p 180 ++ppn 18 ++nodelist $nodefile ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 dyn10.conf

BUT the utilization of the cores is only 30% (+/-10)
and nnvidia-smi reports < 20% utilization

see below

typical node usage
*********************************************
Tasks: 707 total,   1 running, 706 sleeping,   0 stopped,   0 zombie
Cpu0 : 28.9%us, 24.2%sy, 0.0%ni, 46.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 24.7%us, 23.1%sy, 0.0%ni, 52.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 31.2%us, 22.4%sy, 0.0%ni, 46.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 13.8%us, 12.4%sy, 0.0%ni, 73.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 28.2%us, 23.8%sy, 0.0%ni, 48.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 33.4%us, 22.9%sy, 0.0%ni, 43.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 30.6%us, 19.9%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 21.4%us, 11.4%sy, 0.0%ni, 67.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 31.5%us, 21.9%sy, 0.0%ni, 46.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 33.1%us, 21.2%sy, 0.0%ni, 45.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 27.2%us, 24.1%sy, 0.0%ni, 48.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 28.1%us, 24.1%sy, 0.0%ni, 47.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu12 : 28.6%us, 22.6%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 26.5%us, 24.1%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 20.3%us, 28.4%sy, 0.0%ni, 51.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 32.0%us, 22.6%sy, 0.0%ni, 45.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 30.5%us, 21.9%sy, 0.0%ni, 47.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 28.8%us, 22.4%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 33.0%us, 21.1%sy, 0.0%ni, 45.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 30.6%us, 20.4%sy, 0.0%ni, 49.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65877348k total, 3202044k used, 62675304k free,   155228k buffers
Swap: 134217720k total,     8680k used, 134209040k free,   701920k cached

typical GPU usage
********************************************

Mon Apr 27 15:47:19 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.32     Driver Version: 340.32         |
|-------------------------------+----------------------+----------------------+
| GPU Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0 Tesla K20Xm         On   | 0000:03:00.0     Off |                    0 |
| N/A   27C    P0    65W / 235W |    114MiB / 5759MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   1 Tesla K20Xm         On   | 0000:83:00.0     Off |                    0 |
| N/A   27C    P0    70W / 235W |    113MiB / 5759MiB |     21%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
| GPU       PID Process name                                     Usage      |
|=============================================================================|
|    0    104307 ...n/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2    97MiB |
|    1    104307 ...n/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2    97MiB |
+-----------------------------------------------------------------------------+
[bishop@qb091 ~]$

--

*******************************

   Thomas C. Bishop

    Tel: 318-257-5209

    Fax: 318-257-3823

   www.latech.edu/~bishop

********************************