From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Apr 15 2011 - 07:46:00 CDT
Hi Robert,
maybe I have a solution for you problem or a possible solution to make your
scaling better. I'm administrator of a 1Gbit/s Ethernet Linux Cluster and
also tried a lot to improve scaling. There's much to try that could cause
the bad scaling f.i. wrong driver configuration of the nic (is it really
1000Mbit/s). But what really worked for me was to set the flow control
algorithm of TCP from standard reno or cubic to highspeed. This causes a big
improvement for me on parallel runs over multible machines. If one look at
the scaling behavior, one could think that this is a small packet traffic
problem, which also often arises in this subject, but I think it's because
mpi uses many, many short communications and the problem is the time it
needs to establish the connection on the tcp layer. The default
configuration aims more on big data transfers than on fast short
communications. So it's not a problem of the application layer. U won't see
any improvement over two machines, but where u now see a degradation of
performance on more than two nodes, you will see a big improvement in
further scaling behavior.
Try this command as ROOT on all nodes and test scaling again:
sysctl -w net.ipv4.tcp_congestion_control=highspeed
Hope this helps.
Best regards
Norman Geist.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:00 CST