Re: Network slow-downs - someone follow this thread this time please!

cybertech@cybertech.org
Mon, 28 Jun 1999 23:05:32 GMT


This is a multipart message in MIME format.
--=_alternative 007E542D8825679E_=
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I'm reposting this information -- I posted it about a month back, but no=20
one seemed to notice. I've solved the problem in the meanwhile the same=20
way that svenden@pvv.org has -- cycling the interfaces. However, it's=20
having to be done every 10 minutes, with a 5 second delay after downing -- =

HIGHLY annoying, even when done in cron.

I've got 3 machines running 2.2.10. 2 of the machine are PPro200, 1 is=20
PII 350. All are running 3COM 3C905/905b PCI controllers -- 5 of them in=20
total, 2 running at 10mb, 3 at 100mb.

After approximately 2 days of uptime, I will start to see ping times on=20
the local lan jump to 7-20 seconds. As others have noticed, there is no=20
loss -- just some damn high latency. This occurs whether i'm pinging=20
another linux box, a windows box, or a cisco.

It seems to be dependant upon the network load -- lighter loads lead to=20
longer periods between problems. The problem ALSO is gradual -- it'll=20
start at 4 second pings, then 7 second pings about 20 minutes later, than=20
30 minutes later it's up to 12-20 seconds. At this point the machine is=20
almost unusable from remote. The problem doesn't seem to be triggered by =

data being forwarded -- I can route as much as i want thru the firewall=20
(linux) w/o affecting it, but if the same machine is used for server=20
activities it does begin to display the problem.

My kernel is compiled w/gcc version egcs-2.91.66 19990314 (egcs-1.1.2=20
release)

I normally compile the kernel w/ -mpentiumpro, but have gone back to=20
-mpentium for the time being -- that did not change the problem at all.

glibc is version 2.1 and 2.1.1, same symptoms under both.

The network consists of a handful of windows boxes (98, NT4, Windows=20
2000), a cisco 2524, and the linux boxes. Nothing is dhcp enabled, ALL=20
nics in all machine are 3com based, and i'm all out of relevant=20
information. :)

David S. Stahl
President, CTCS/Dalsom Inc.
cybertech@cybertech.org

svendsen@pvv.org. (=D8ystein Svendsen)
Sent by: owner-linux-kernel@vger.rutgers.edu
06/28/1999 11:56 AM

=20
To: linux-kernel@vger.rutgers.edu
cc: (bcc: CyberTech/CTCS)
Subject: Network slow-downs

Hello,

I am experiencing quite drastically slow-downs on the network
performance on a dual P-II system running kernel 2.2.10.

The machine currently has two network cards, one ISA NE2000 clone and
one Intel Ether Express 100 card, which is the card I have trouble
with. The machine works many as NFS-server.

After upgrading to the 2.2 series we have from time to time
experienced severe slow-downs on the TCP performance, down to 1-10% of
the normal performance, measured by ttcp. According to /proc/net/dev,
we have no packet loss when the network slows down.

When using ping from an Ultra 2 running Solaris 2.6, I record an
average ping response ranging from 100 to 4000 milliseconds, but no
loss, the ping response is just slow.

The performance goes back to normal when I take down the interface and
reinsert the eepro100 module into the kernel. After I've done that,
the performance is fine for a couple of days or maybe weeks.

I have tried to just take the interface down with ifconfig without
removing the module, and this seems to fix the problem, but only until
a large file is read through NFS. (I'm using the user level nfsd, but
I have had this problem using knfsd too.)

When I measure the ping response while dd-ing a 60 MB file to
/dev/null through NFS on another machine, the response seems to be
fine until the whole file has been read, then the ping response grows
rapidly up to about an average of 100-200 ms, and after a hour or so
we end up with an average response of 3000-4000 ms.

The traffic that goes through the NE2000-card seems to be unaffected
by what's happening on the eepro100 interface, the ping response is
stable way below 5 ms all time. I used to have an Accton 21140-based
card instead of the eepro100 card, and I experienced the same problems
using that card too.

I have had seen this problem in 2.2.6, 2.2.7, 2.2.9 and 2.2.10.

We have another box which is used as a workstation, but no SMP, and I
have on one occasion seen this problem there, and this box does not
have a SMP enabled kernel.

I am not yet able to find a way to reproduce the problem, but I have
noticed that it somehow seems to be triggered by messing with the nfs
daemon (killing and restarting it, that is).

I hope this is a problem that could be fixed, and I'll be happy to
provide more details, and otherwise do whatever is needed to find out
what's happening.

--
    =D8ystein Svendsen

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

--=_alternative 007E542D8825679E_= Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

<br><font size=3D2 face=3D"sans-serif">I'm reposting this information -- I = posted it about a month back, but no one seemed to notice. &nbsp;I've solve= d the problem in the meanwhile the same way that svenden@pvv.org has -- cyc= ling the interfaces. &nbsp;However, it's having to be done every 10 minutes= , with a 5 second delay after downing -- HIGHLY annoying, even when done in= cron.</font> <br> <br><font size=3D2 face=3D"sans-serif">I've got 3 machines running 2.2.10. = &nbsp;2 of the machine are PPro200, 1 is PII 350. &nbsp;All are running 3CO= M 3C905/905b PCI controllers -- 5 of them in total, 2 running at 10mb, 3 at= 100mb.</font> <br> <br><font size=3D2 face=3D"sans-serif">After approximately 2 days of uptime= , I will start to see ping times on the local lan jump to 7-20 seconds. &nb= sp;As others have noticed, there is no loss -- just some damn high latency.= &nbsp;This occurs whether i'm pinging another linux box, a windows box, or= a cisco.</font> <br> <br><font size=3D2 face=3D"sans-serif">It seems to be dependant upon the ne= twork load -- lighter loads lead to longer periods between problems. &nbsp;= The problem ALSO is gradual -- it'll start at 4 second pings, then 7 second= pings about 20 minutes later, than 30 minutes later it's up to 12-20 secon= ds. &nbsp;At this point the machine is almost unusable from remote. &nbsp; = The problem doesn't seem to be triggered by data being forwarded -- I can r= oute as much as i want thru the firewall (linux) w/o affecting it, but if t= he same machine is used for server activities it does begin to display the = problem.</font> <br> <br><font size=3D2 face=3D"sans-serif">My kernel is compiled w/gcc version = egcs-2.91.66 19990314 (egcs-1.1.2 release)</font> <br> <br><font size=3D2 face=3D"sans-serif">I normally compile the kernel w/ -mp= entiumpro, but have gone back to -mpentium for the time being -- that did n= ot change the problem at all.</font> <br> <br><font size=3D2 face=3D"sans-serif">glibc is version 2.1 and 2.1.1, same= symptoms under both.</font> <br> <br><font size=3D2 face=3D"sans-serif">The network consists of a handful of= windows boxes (98, NT4, Windows 2000), a cisco 2524, and the linux boxes. = &nbsp;Nothing is dhcp enabled, ALL nics in all machine are 3com based, and = i'm all out of relevant information. :)</font> <br> <br> <br><font size=3D2 face=3D"sans-serif">David S. Stahl<br> President, CTCS/Dalsom Inc.<br> cybertech@cybertech.org</font> <br> <br> <br> <table width=3D100%> <tr valign=3Dtop> <td> <td><font size=3D2 face=3D"sans-serif"><b>svendsen@pvv.org. (=D8ystein Sven= dsen)</b></font> <br><font size=3D2 face=3D"sans-serif">Sent by: owner-linux-kernel@vger.rut= gers.edu</font> <p><font size=3D2 face=3D"sans-serif">06/28/1999 11:56 AM</font> <br> <td><font size=3D1 face=3D"Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font> <br><font size=3D2 face=3D"sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbs= p; &nbsp; &nbsp; &nbsp;linux-kernel@vger.rutgers.edu</font> <br><font size=3D2 face=3D"sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbs= p; &nbsp; &nbsp; &nbsp;(bcc: CyberTech/CTCS)</font> <br><font size=3D2 face=3D"sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject:= &nbsp; &nbsp; &nbsp; &nbsp;Network slow-downs</font></table> <br> <br><font size=3D2><tt>Hello,<br> </tt></font> <br><font size=3D2><tt>I am experiencing quite drastically slow-downs on th= e network<br> performance on a dual P-II system running kernel 2.2.10.<br> </tt></font> <br><font size=3D2><tt>The machine currently has two network cards, one ISA= NE2000 clone and<br> one Intel Ether Express 100 card, which is the card I have trouble<br> with. &nbsp;The machine works many as NFS-server.<br> </tt></font> <br><font size=3D2><tt>After upgrading to the 2.2 series we have from time = to time<br> experienced severe slow-downs on the TCP performance, down to 1-10% of<br> the normal performance, measured by ttcp. &nbsp;According to /proc/net/dev,= <br> we have no packet loss when the network slows down.<br> </tt></font> <br><font size=3D2><tt>When using ping from an Ultra 2 running Solaris 2.6,= I record an<br> average ping response ranging from 100 to 4000 milliseconds, but no<br> loss, the ping response is just slow.<br> </tt></font> <br><font size=3D2><tt>The performance goes back to normal when I take down= the interface and<br> reinsert the eepro100 module into the kernel. &nbsp;After I've done that,<b= r> the performance is fine for a couple of days or maybe weeks.<br> </tt></font> <br><font size=3D2><tt>I have tried to just take the interface down with if= config without<br> removing the module, and this seems to fix the problem, but only until<br> a large file is read through NFS. (I'm using the user level nfsd, but<br> I have had this problem using knfsd too.)<br> </tt></font> <br><font size=3D2><tt>When I measure the ping response while dd-ing a 60 M= B file to<br> /dev/null through NFS on another machine, the response seems to be<br> fine until the whole file has been read, then the ping response grows<br> rapidly up to about an average of 100-200 ms, and after a hour or so<br> we end up with an average response of 3000-4000 ms.<br> </tt></font> <br><font size=3D2><tt>The traffic that goes through the NE2000-card seems = to be unaffected<br> by what's happening on the eepro100 interface, the ping response is<br> stable way below 5 ms all time. &nbsp;I used to have an Accton 21140-based<= br> card instead of the eepro100 card, and I experienced the same problems<br> using that card too.<br> </tt></font> <br><font size=3D2><tt>I have had seen this problem in 2.2.6, 2.2.7, 2.2.9 = and 2.2.10.<br> </tt></font> <br><font size=3D2><tt>We have another box which is used as a workstation, = but no SMP, and I<br> have on one occasion seen this problem there, and this box does not<br> have a SMP enabled kernel.<br> </tt></font> <br><font size=3D2><tt>I am not yet able to find a way to reproduce the pro= blem, but I have<br> noticed that it somehow seems to be triggered by messing with the nfs<br> daemon (killing and restarting it, that is).<br> </tt></font> <br><font size=3D2><tt>I hope this is a problem that could be fixed, and I'= ll be happy to<br> provide more details, and otherwise do whatever is needed to find out<br> what's happening.<br> </tt></font> <br><font size=3D2><tt>--<br> &nbsp; &nbsp;=D8ystein Svendsen<br> </tt></font> <br> <br><font size=3D2><tt>-<br> To unsubscribe from this list: send the line &quot;unsubscribe linux-kernel= &quot; in<br> the body of a message to majordomo@vger.rutgers.edu<br> Please read the FAQ at http://www.tux.org/lkml/</tt></font> <br> <br> --=_alternative 007E542D8825679E_=--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/