AW: 50% system CPU usage when parallel running NAMD on Rocks cluster

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Dec 16 2013 - 01:07:28 CST

Please send the output of:

 

ifconfig

ethtool eth0 <- or whatever your interface is (eth1 etc.)

ethtool -k eth0

ethtool -c eth0

sysctl -a | grep tcp

 

Norman Geist.

 

Von: 周昵昀 [mailto:malrot13_at_gmail.com]
Gesendet: Samstag, 14. Dezember 2013 14:56
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: 50% system CPU usage when parallel running NAMD on Rocks cluster

 

I have changed my switch from 3Com Switch 2824 to IP-Com G1024(a low-end gigabit switch brrowed from reseller). To my suprise, there is neither any performance improvement nor deterioration. The benchmark result is totally the same as before within a reasonable error range. I guess it's not only an "old-switch" problem.

Any help would be appreciated!

 

Neil

 

2013/12/10 Norman Geist <norman.geist_at_uni-greifswald.de>

Jeah, something like that. I guess relative comparison is the best choice in your case. That’s why I gave you a reference model. Otherwise, look for some support from resellers.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von ???

Gesendet: Dienstag, 10. Dezember 2013 13:52

An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: 50% system CPU usage when parallel running NAMD on Rocks cluster

 

Your suggestion is very helpful. We are looking for some "better scale" switch. But here comes another question: what feature represent the "switching-latency" or "switching-capacity" of one switch? Is it "packet forwarding speed" or something else? Thanks again!

 

Neil

 

2013/12/9 Norman Geist <norman.geist_at_uni-greifswald.de>

Maybe a little. There’s lots you can try on the software side of the problem, but all this will only try to circumvent the real problem or lessen the impact. The most comfortable and likely successful solution, is buying another switch. So the keywords are switching-latency and switching-capacity. Take the model I posted as a reference, but notice, that 16 cores per node is really heavy for 1Gbit/s Ethernet and you might want to consider spending some money into a HPC network like Infiniband or at least 10Gbit/s Ethernet.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von ???

Gesendet: Sonntag, 8. Dezember 2013 16:12
An: Norman Geist
Cc: Namd Mailing List

Betreff: Re: namd-l: 50% system CPU usage when parallel running NAMD on Rocks cluster

 

Thanks for your reply! 16 cores per node are physical, HT was closed before NAMD was tested. I'll consider buying a new switch.

 

BTW, will it scale better if I compile a UDP version NAMD?

 

Neil Zhou

2013/12/3 Norman Geist <norman.geist_at_uni-greifswald.de>

Your switch is too slow in switching. Try something like the netgear gs748t, not that expensive and “ok” scaling. You can temporarily improve the situation by trying the tcp congestion control algorithm “highspeed”. Set it via sysconfig on all the nodes.

 

Additionally, are these 16 cores per node physical or logical (HT). If it is HT, leave them out, no speed gain, only more network load.

 

Norman Geist.

 

 

 

 

  _____

 <http://www.avast.com/> Fehler! Es wurde kein Dateiname angegeben.

Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus <http://www.avast.com/> Schutz ist aktiv.

 

 

 

  _____

 <http://www.avast.com/>

Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus <http://www.avast.com/> Schutz ist aktiv.

 

 

---
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv.
http://www.avast.com

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:00 CST