From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Sat Dec 06 2008 - 03:18:29 CST
On 12/5/08, Zailo Leite <zleite_at_caltech.edu> wrote:
> We're running it on a 64-node cluster, 8 cores per node, infiniband
> Using the ApoA1 benchmark, thing go pretty well until we go over 104
> cores, then not only it degrades signifcantly but the scaling becomes
> very chaotic. Any suggestions? something relating to the matrix
> decomposition perhaps?
more likely you have reached the the scaling limit with using 8 cores/node.
please try with 7 core/node or even 6. you have to deal two problems:
- communication congestion on the infiniband interface (all communication
- OS jitter (or OS noise), the kernel has to serve other processes, this adds
latency (and at the limit of scaling for a classical MD code, latency is
all that matters) if you keep one processor core "empty" it does interfere
less with the ongoing communication.
with more bandwidth intense applications, you'll have to go down to 4cores/node
to get the maximum performance out of a machine. particular with current
intel quad core, you are frequently better off in considering them dual core
cpus with twice the cache...
> Zailo Leite
> Sysadmin, IMSS -
> Academic Unix Solutions /
> High Performance Computing Group
> California Institute of Technology
> Phone (626)395-3507 - Cell (626)394-6989
-- ======================================================================= Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu Center for Molecular Modeling -- University of Pennsylvania Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323 tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425 ======================================================================= If you make something idiot-proof, the universe creates a better idiot.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:12 CST