Re: CHARM/NAMD network problems on amd64 Clustermatic

From: Rene Salmon (rsalmon_at_tulane.edu)
Date: Wed May 25 2005 - 12:12:05 CDT

Hello,

Thank you for the reply. This was indeed a Linux kernel network driver issue
and not an issue with NAMD. After updating the kernel network drivers everything seems to be working fine.

Thank you again for the help.

Rene

On Wed, 11 May 2005, Gengbin Zheng wrote:

>
> The line in the printout is not very clear; It really depends on how you
> break the records in the line: I added "^"s to break the records.
>
> Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
> Flg
> eth0 1500 0^23182578986 637615860 0 0^23839814024 1 0
> 0 BMRU
> eth1 1500 0 0 0 0 0 0 0 0 0
> BMU
> lo 16436 0^2271084541 0 0 0^2271084541 0 0
> 0 LRU
>
> so it looks like a lot of receive packet error but no dropped packets, also
> there is almost no send packet error. If this is parity error at receive time,
> it could mean a bad ethernet card or something.
> Does it also happen to other applications?
>
> Gengbin
>
> Rene Salmon wrote:
>
> > Hi List,
> >
> > We are having some strange network problems with CHARM/NAMD on an AMD64
> > clustermatic 5 cluster. On nodes that are not running NAMD jobs we this for
> > the network stats
> >
> >
> > > # bpsh 10 netstat -i
> > >
> > Kernel Interface table
> > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
> > Flg
> > eth0 1500 0221380022 0 0 0238341193 0 0 0
> > BMRU
> > eth1 1500 0 0 0 0 0 0 0 0 0
> > BMU
> > lo 16436 0 18 0 0 0 18 0 0 0
> > LRU
> >
> >
> >
> >
> >
> > As you can see no error or dropped packets. But on nodes that are running
> > NAMD Jobs we get this:
> >
> >
> > > # bpsh 4 netstat -i
> > >
> > Kernel Interface table
> > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
> > Flg
> > eth0 1500 023182578986 637615860 0 023839814024 1 0
> > 0 BMRU
> > eth1 1500 0 0 0 0 0 0 0 0 0
> > BMU
> > lo 16436 02271084541 0 0 02271084541 0 0
> > 0 LRU
> >
> >
> >
> > Which shows lots of error and dropped packets. Is this normal? Some how
> > this is slowing the network down and causing NFS problems.
> >
> > Any ideas?
> >
> > Thank you Rene
> >

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:47 CST