networking / web perf probs

Larry McVoy (lm@who.net)
Sat, 13 Dec 1997 22:38:01 +0800


I was at a web conference last week and there was a paper presented that
attempted to claim that there is no performance problem with web servers.
Silly, I know. Fortunately, Jeff Mogul had a look at it and did a 10
minute rebuttal that showed what the problem was, why it would appear
that the servers were not loaded, and what the fix was. It's a BSD
problem so maybe it doesn't exist in Linux - I just want to make sure.

The problem is that when a web server has more than a certain number
of packets in the input queue, new packets will just get dropped.
It isn't so bad if one of the new packets is a data packet, but it is
horrible if one of the dropped packets is the connection setup SYN.
The retransmit timeout is an exponential backoff, starting at 5 seconds.
This is why lots of people hit the "STOP" button on their browser and
reload and that works better.

A server in this situation is not necessarily out of CPU, in fact it is
quite likely that the server is quite idle. The resource in question
is the input packet queue, not CPU cycles. Typical BSD based systems
suffering from this problem are usually 90+% idle.

The simple fix is to crank up the input queue. SGI cranked theirs
to 512 packets per queue (and there is a queue per CPU). DEC cranked
theirs as well (anyone have OSF/1 header files out there to figure out
how high it is?).

Another part of the fix is to have

listen(sock, 0)

work like normal, but

listen(sock, >0)

should be changed (in the kernel) to be something like

listen(sock, sizeof(input queue length))

There are a lot of leftover programs that think a back log of 5 is
reasonable. Those programs are naive.

Anyway, I'm guilty of not diving into the code to see if this is fixed.
If noone says "yes it is" or "no it isn't", I'll try and figure it out
but I'll bet somebody out there already knows.

--lm