On Mon, Mar 07, 2005 at 04:14:37PM +1100, Nick Piggin wrote:
I think you would have better luck in reproducing this problem if you
did the full sendfile thing.
I think it is becoming disk bound due to page reclaim problems, which
is causing the slowdown.
In that case, writing the network only test would help to confirm the
problem is not a networking one - so not useless by any means.
Not necessarily, Nick. I have written an HTTP testing tool which matches
the description of Ben's : non-blocking, single-threaded, no disk I/O,
etc... It works flawlessly under 2.4, and gives me random numbers in 2.6,
especially if I start some CPU activity on the system, I can get pauses
of up to 13 seconds without this tool doing anything !!! At first I
believed it was because of the scheduler, but it might also be related
to what is described here since I had somewhat the same setup (gigE, 1500,
thousands of sockets). I never had enough time to investigate more, so I
went back to 2.4.