BUG: Slowdown on 3000 socket-machines tracked down

From: Christian Schmid
Date: Sat Mar 05 2005 - 12:27:14 EST


Hello.

After weeks of work, I can now give a detailed report about the bug and when it appears:

Attached is another traffic-image. This one is with 2.6.10 and a 3/1 split, preemtive kernel, so all defaults.

The first part is where I throttled the whole thing to 100 MBit in order to build up a traffic-jam ;)

When I released it, it jumped up immediately but suddenly it goes down (each pixel is one second) Playing around with min_free_kbytes didnt help. Where it goes up again I set lower_zone_protection to 1024000 and where it goes down I set it to 0 again and where it goes up the last time... guess..

This test was with 3500 sockets.

Today I tested with 5000 sockets. The problem is the same like above but the more sockets there come, it just doesnt claim more bandwidth as it SHOULD of course do. It seems it doesn't slow down but it just doesnt scale anymore. The badwidth doesnt go over 80 MB/Sec, no matter what I do. Then I did the following: I raised lower_zone_protection to 1024 (above I did 1024000 which is bullshit but it doesnt matter as it seems to just protect the whole low-mem which is what I want) and it was at 80 MB. then I lowered to 0 again and suddenly it peaked up to full bandwidth (100 MB) for about 5 seconds until the whole protected area was in use. Then it slowed down drastically again.

My theory:

I suppose when the blocks come in fast enough and the load is high enough, the kernel cant free the required low memory as fast as this would be required in order to NOT slow-down everything. So the vm is basically busy freeing low-memory. What do you think? The interesting part is that it slows down painfully with lower_zone_protection set to 0, it peaks at 80 MB/Sec with lower_zone_protection set to max (1024, whole low-mem) and there is much cpu-ressources free..... When set to 0 it speeds up without limit AS LONG AS there is memory left. When its consumed, it slows down painfully again because its set to 0 of course.

Chris

PNG image