From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Sep 01 2015 - 12:47:56 CDT
It sometimes helps to artifically increase the number of patches by adding
to the jobscript. Optionally if that brings improvement, one can try twoawayy and twoawayz additionally.
There might be two major problems here:
1. Raised computing power for same sized system might result in outscaling.
2. PCI-E bandwidth is saturated.
I suspect your system might be too small. If the above solution with twoawayx helps, this would point out already.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Maxime Boissonneault
Gesendet: Dienstag, 1. September 2015 16:48
An: namd-l <namd-l_at_ks.uiuc.edu>
Betreff: namd-l: Reducing the amount of work being done on CPU
We have received very fat GPU nodes, which have 16 GPUs (8 x K80), and 2
12-core sockets (24 CPU cores).
While I got a rather good scaling with the ApoA1 benchmark on nodes with
8 x K20 + 2 x 10-core sockets (scaling was almost perfect between 1 GPU
+ 2 cores and 8 GPUs + 20 cores), the scaling is not nearly as
impressive on our very fat nodes.
I suspect the reason is because the low number of CPU cores per GPU is
becoming a bottle neck.
Is there any setting I should chnage in the benchmark to bias more
workload toward the GPUs rather than the CPUs ?
-- --------------------------------- Maxime Boissonneault Analyste de calcul - Calcul Québec, Université Laval Instructeur Software Carpentry Président - Comité de coordination du soutien à la recherche de Calcul Québec Ph. D. en physique
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:22:02 CST