Re: BUG: Slowdown on 3000 socket-machines tracked down

From: Ben Greear
Date: Wed Mar 09 2005 - 23:04:42 EST

Christian Schmid wrote:
Yes, 2.6.11. I have tuned max_backlog and some other TCP and networking
related settings to give more buffers etc to networking tasks. I have not
tried any significant disk-IO while doing these tests.

I finally got my systems set up so I can run my WAN emulator at full 1Gbps:

I am getting right at 986Mbps throughput with 30ms round-trip latency
(15ms in both directions).

So, latency does not seem to be the problem either.

I think the problem can be narrowed down to:

1) Non-optimal kernel network tunings on your server.

I used all the default-settings on 2.6.11

Here are my settings. Hopefully it will be clear what I'm
talking about..yell if you need details. Please note that I explicitly
set the send buffers to 128k and the rcv to 16k in my test so the min and max
socket queue lengths do not matter here.

my $dflt_tx_queue_len = 2000; # Ethernet driver transmit-queue length. Might be worth making
# it bigger for GigE nics.

my $netdev_max_backlog = 5000; # Maximum number of packets, queued on the INPUT side, when
# the interface receives pkts faster than it can process them.

my $wmem_max = 4096000; # Write memory buffer. This is probably fine for any setup,
# and could be smaller (256000) for < 5Mbps connections.

my $wmem_default = 128000; # Write memory buffer. This is probably fine for any setup,
# and could be smaller (256000) for < 5Mbps connections.

my $rmem_max = 8096000; # Receive memory (packet) buffer. If you are running
# lots of very fast traffic,
# you may want to make this larger if you are running over
# fast, high-latency networks.
# For < 5Mbps of traffic, 512000 should be fine.

my $rmem_default = 128000; # Receive memory (packet) buffer.

# If this is not 1, then the tcp_* settings below will not be applied.
my $modify_tcp_settings = 1;

# See the kernel documentation: Documentation/networking/ip-sysctl.txt
my $tcp_rmem_min = 4096;
my $tcp_rmem_default = 256000; # TCP specific receive memory pool size.
my $tcp_rmem_max = 30000000; # TCP specific receive memory pool size.

my $tcp_wmem_min = 4096;
my $tcp_wmem_default = 256000; # TCP specific write memory pool size.
my $tcp_wmem_max = 30000000; # TCP specific write memory pool size.

my $tcp_mem_lo = 20000000; # Below here there is no memory pressure.
my $tcp_mem_pressure = 30000000; # Can use up to 30MB for TCP buffers.
my $tcp_mem_high = 60000000; # Can use up to 60MB for TCP buffers.

2) Disk-IO (my disk is small and slow compared to a 'real' server, not sure I can
really test this side of things, and I have not tried as of yet.)

This doesnt explain the speed-up when I change lower_zone_protection from 0 to 1024. It also doesnt explain the slowdown on 2.6.11 compared to 2.6.10

Disk-IO uses buffers, so a change here could easily starve the rest
of your system. I'm just saying I can't reliably test this. To be honest,
my machines are already throwing allocation failures in the ethernet drivers
and I've had the OOM killer kill my main process several times. So, my machines
are running right at their memory limit, even w/out any disk IO.

3) Your clients have much more latency and/or don't have enough bandwidth
to fully load your server. Since you didn't answer before: I assume you
do not have a reliable test bed and are just hoping that enough clients connect
to do your benchmarking.

Yes I just wait until they connect. On the graph it only takes about 2 minutes until 3000 sockets are created again.

But, you could get unlucky and have 3000 people on a shitty dialup
connection connect to you. That does not make it easy to reliably
test the system.

4) There is something strange with sendfile and/or your application's coding.

I am not doing more than calling sendfile. There is nothing one can do wrong.

My suggestion would be to eliminate these variables by coming up with a repeatable
test bed, alternative traffic generators, WAN/Network emulators for latency, etc.

The problem still is that 1) it speeds up immediately when lower_zone_protection is raised to 1024. This proves it is NOT a disk-bottleneck. And second: it got much worse with 2.6.11 and lower_zone_protection disappeared on 2.6.11

So, maybe a VM problem? That would be a good place to focus since
I think we can be fairly certain it isn't a problem in just the
networking code. Otherwise, my tests would show lower bandwidth.


Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at