RE: performance question

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Apr 28 2015 - 02:51:06 CDT

No, usually the utilisation is higher, but this doesn’t matter if the speedup is satisfying.

So you should benchmark for various node counts and have a look on the speedup, relative

to one node.

 

I’ll give some hints on what to try:

 (forget about +ppn +pemap +commap for now)

 

1. Do not pass +devices, better pass +ignoresharing

2. Try adding “twoawayx yes” in your namd script

On improvement try adding twoawayy.

On improvement try adding twoawayz.

 

Norman Geist.

 

From: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] On Behalf Of Thomas C. Bishop
Sent: Monday, April 27, 2015 10:54 PM
To: namd-l_at_ks.uiuc.edu
Subject: namd-l: performance question

 

Dear NAMD,

Is it typical to have ~30% CPU usage (reported by say uptime/top) and ~20% GPU usage
reported by nvidia-smi for NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/ runs ?

I'm used to seeing the CPUs pegged at 100% for non-gpu runs.

Any suggestions/feedback greatly appreciated.
TOm

Details
*****************
I have a system w/ 266038 atoms
and I"m trying to optimize the run time performance on 200 core (10 nodes) of a machine where each node has

Two 10-core 2.8 GHz E5-2680v2 Xeon processors
Two NVIDIA Tesla K20x GPU's
56 Gb/sec (FDR) InfiniBand 2:1 oversubscribed mesh)

I get the best performance when I leave one or two cores per node for communication

 ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA//charmrun ++p 180 ++ppn 18 ++nodelist $nodefile ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 +pemap 0-8,10-18 +commap 9,19 +devices 0,1,0,1 dyn10.conf

OR more simply

 ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA///charmrun ++p 180 ++ppn 18 ++nodelist $nodefile ~/bin/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 dyn10.conf

BUT the utilization of the cores is only 30% (+/-10)
and nnvidia-smi reports < 20% utilization

see below

typical node usage
*********************************************
Tasks: 707 total, 1 running, 706 sleeping, 0 stopped, 0 zombie
Cpu0 : 28.9%us, 24.2%sy, 0.0%ni, 46.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 24.7%us, 23.1%sy, 0.0%ni, 52.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 31.2%us, 22.4%sy, 0.0%ni, 46.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 13.8%us, 12.4%sy, 0.0%ni, 73.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 28.2%us, 23.8%sy, 0.0%ni, 48.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 33.4%us, 22.9%sy, 0.0%ni, 43.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 30.6%us, 19.9%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 21.4%us, 11.4%sy, 0.0%ni, 67.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 31.5%us, 21.9%sy, 0.0%ni, 46.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 33.1%us, 21.2%sy, 0.0%ni, 45.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 27.2%us, 24.1%sy, 0.0%ni, 48.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 28.1%us, 24.1%sy, 0.0%ni, 47.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu12 : 28.6%us, 22.6%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 26.5%us, 24.1%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 20.3%us, 28.4%sy, 0.0%ni, 51.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 32.0%us, 22.6%sy, 0.0%ni, 45.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 30.5%us, 21.9%sy, 0.0%ni, 47.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 28.8%us, 22.4%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 33.0%us, 21.1%sy, 0.0%ni, 45.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 30.6%us, 20.4%sy, 0.0%ni, 49.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65877348k total, 3202044k used, 62675304k free, 155228k buffers
Swap: 134217720k total, 8680k used, 134209040k free, 701920k cached

typical GPU usage
********************************************

Mon Apr 27 15:47:19 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.32 Driver Version: 340.32 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20Xm On | 0000:03:00.0 Off | 0 |
| N/A 27C P0 65W / 235W | 114MiB / 5759MiB | 17% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20Xm On | 0000:83:00.0 Off | 0 |
| N/A 27C P0 70W / 235W | 113MiB / 5759MiB | 21% Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 104307 ...n/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 97MiB |
| 1 104307 ...n/NAMD_2.10_Linux-x86_64-ibverbs-smp-CUDA/namd2 97MiB |
+-----------------------------------------------------------------------------+
[bishop_at_qb091 ~]$

-- 
*******************************
   Thomas C. Bishop
    Tel: 318-257-5209
    Fax: 318-257-3823
   www.latech.edu/~bishop
******************************** 

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:50 CST