Re: CHARM/NAMD network problems on amd64 Clustermatic

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Wed May 11 2005 - 22:21:36 CDT

The line in the printout is not very clear; It really depends on how
you break the records in the line: I added "^"s to break the records.

Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0^23182578986 637615860 0 0^23839814024 1 0 0 BMRU
eth1 1500 0 0 0 0 0 0 0 0 0 BMU
lo 16436 0^2271084541 0 0 0^2271084541 0 0 0 LRU

so it looks like a lot of receive packet error but no dropped packets,
also there is almost no send packet error. If this is parity error at
receive time, it could mean a bad ethernet card or something.
Does it also happen to other applications?

Gengbin

Rene Salmon wrote:

>Hi List,
>
>We are having some strange network problems with CHARM/NAMD on an AMD64
>clustermatic 5 cluster.
>
>On nodes that are not running NAMD jobs we this for the network stats
>
>
>
>># bpsh 10 netstat -i
>>
>>
>Kernel Interface table
>Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
>Flg
>eth0 1500 0221380022 0 0 0238341193 0 0 0
>BMRU
>eth1 1500 0 0 0 0 0 0 0 0 0
>BMU
>lo 16436 0 18 0 0 0 18 0 0 0
>LRU
>
>
>
>
>
>As you can see no error or dropped packets. But on nodes that are running
>NAMD Jobs we get this:
>
>
>
>># bpsh 4 netstat -i
>>
>>
>Kernel Interface table
>Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
>Flg
>eth0 1500 023182578986 637615860 0 023839814024 1 0
>0 BMRU
>eth1 1500 0 0 0 0 0 0 0 0 0
>BMU
>lo 16436 02271084541 0 0 02271084541 0 0
>0 LRU
>
>
>
>Which shows lots of error and dropped packets. Is this normal? Some how
>this is slowing the network down and causing NFS problems.
>
>Any ideas?
>
>Thank you
>Rene
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:27 CST