From: Rene Salmon (rsalmon_at_tulane.edu)
Date: Wed May 11 2005 - 13:07:25 CDT
Hi List,
We are having some strange network problems with CHARM/NAMD on an AMD64
clustermatic 5 cluster. 
On nodes that are not running NAMD jobs we this for the network stats
># bpsh 10 netstat -i
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR
Flg
eth0   1500   0221380022      0      0      0238341193      0      0      0
BMRU
eth1   1500   0       0      0      0      0       0      0      0      0
BMU
lo    16436   0      18      0      0      0      18      0      0      0
LRU
As you can see no error or dropped packets.  But on nodes that are running
NAMD Jobs we get this:
># bpsh 4 netstat -i
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR
Flg
eth0   1500   023182578986 637615860      0      023839814024      1      0
0 BMRU
eth1   1500   0       0      0      0      0       0      0      0      0
BMU
lo    16436   02271084541      0      0      02271084541      0      0
0 LRU
Which shows lots of error and dropped packets.  Is this normal?  Some how
this is slowing the network down and causing NFS problems.
Any ideas?
Thank you 
Rene
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:26 CST