AW: AW: NAMD crashes with OpenMPI/OpenMX

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed Jun 15 2011 - 01:01:25 CDT

Hi Thomas,

well on the first look, the scaling doesn't look that bad. Looks like your
system isn't maybe big enough to scale well. Which TCP congestion control
algorithm do you use (reno or cubic) -> try highspeed (sysctl -a/-w), and
what kind of gigabit network adapter do you use, is it onboard and if, which
bus connection does it have (pci or pcie) or is it a plugged pcie nic?? That
things can make a big difference while standard pci bus has only 1.1 Gbit/s
for the whole bus which must be multiplexed for sending and receiving, a
pcie x1 nic already has 2Gbit/s in both directions simultaneously. Mainboard
manufacturer often plug even two onboard gigabit nics on such a slow pci
bus, which never can cope with the traffic and can't provide the bandwidth.
Another thing is the capacity of the switch, have you tried to run two such
jobs simultaneously on 12 cores for example? Then you maybe see if both jobs
get worse, maybe the switch capacity isn't enough.

Good luck.

Norman Geist.

-----Ursprüngliche Nachricht-----
Von: Thomas Albers [mailto:talbers_at_binghamton.edu]
Gesendet: Dienstag, 14. Juni 2011 18:27
An: Norman Geist
Betreff: Re: AW: namd-l: NAMD crashes with OpenMPI/OpenMX

Hello!

> I don't think that the pre-built binaries scale poor, it's maybe more a
> matter of your network configuration, and you will get faced with that
even
> if you get the openmpi solution to work. If you need help, feel free to
ask.

The computers are on their own network and are connected via gigabit
ethernet.

This is what the data look like:

        n(CPU) Wall Clock n(CPU)*Wall Clock
        18 63.21 1137
        16 86.72 1387
        12 84.79 1017
        8 115.51 924
        4 214.89 859
        4(local) 209.20 836

With a low number of cores the network version scales quite well, but
the more cores are used the greater the penalty gets, approaching 30 %
on all 18 cores. I suppose the latency that comes with TCP/IP is
visible here. The hope is what with the Open-MX stack the scaling is
better. Axel mentioned Open-MX/Open-MPI a few months ago but he hadn't
tried.

Thomas

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:18 CST