Re: BUG: Slowdown on 3000 socket-machines tracked down

From: Ben Greear
Date: Tue Mar 08 2005 - 01:34:53 EST


Nick Piggin wrote:
Ben Greear wrote:

Nick Piggin wrote:

Ben Greear wrote:


In that case, writing the network only test would help to confirm the
problem is not a networking one - so not useless by any means.



It's not trivial to write something like this :)

I'll be using something I already have. If I can't reproduce the problem,
then perhaps it is due to sendfile and someone can write a customized
test. The main reason I offered is because people are ignoring the
bug report for the most part and asking for a test case. I may be able
to offer an independent verification of the problem which might convince
someone to write up a dedicated test case...


OK, no that sounds good, please do make the test case.

I have actually been following up with Christian regarding
the disk IO / memory management side of things but the thread
has gone offline for some reason :\

Initial test setup: two machines, running connections between them.
Mostly asymetric (about 50Mbps in one direction,
GigE in the other). Each connection is trying some random rate between 128kbps
and 3Mbps in one direction, and 1kbps in the other direction.

Sending machine is dual 3.0Ghz xeons, 1MB cache, HT, and emt64 (running 32-bit
kernel & user space though). 1GB of RAM

Receiving machine is dual 2.8Ghz xeons, 512 MB cache, HT, 32-bit. 2GB of RAM
(but only 850Mbps of low memory of course...saw the thing OOM kill me with 1GB of
free high memory :( )


Zero latency:

2000 TCP connections: When I first start, I see errors indicating I'm out of low
memory..but it quickly recovers. Probably because my program takes a small
bit of time before it starts reading the sockets.
986Mbps of ethernet traffic (counting all ethernet headers)

3000 TCP connections: Same memory issue
986Mbps of ethernet traffic, about 82kpps

4000 TCP connections: Had to drop max_backlog to 5000 from 10000 to keep
the machine from going OOM and killing my traffic generator (on
the receiving side).
986Mbps of ethernet traffic

I will work on some numbers with latency tomorrow (had to stop and
re-write some of my code to better handle managing the 8000 endpoints
that 4000 connections requires!)

I think we can assume that the problem is either related to latency,
or sendfile, since 4000 connections with no latency rocks along just
fine...

Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/