Re: Bad apache perfomance wtih linux SMP

Juergen Schmidt (ju@ct.heise.de)
Tue, 18 May 1999 19:44:25 +0200


Andi Kleen wrote:
> One culprit is most likely that the data copy for TCP sending runs completely
> serialized. This can be fixed by doing replacing the
>
> skb->csum = csum_and_copy_from_user(from,
> skb_put(skb, copy), copy, 0, &err);
>
> in tcp.c:tcp_do_sendmsg with
>
> unlock_kernel();
> skb->csum = csum_and_copy_from_user(from,
> skb_put(skb, copy), copy, 0, &err);
> lock_kernel();

Bingo !!!

This raised performance from 270 rps to 802 rps when 64 clients were
pulling a 4k HTML-page. Single CPU perfomance lies by 890 rps -- but the
new numbers are just from a very short run. (BTW: NT/IIS on 4 CPUs
deliver 840 rps :-)

Or with apaches ab:

ab -c 8 -t 120 127.0.0.1/4k.html

produces:

2.2.8 4 CPUs: 350.95
2.2.8 4 CPUs with patch: 1334.19
2.2.8 no SMP: 1540.22

BTW: I'm going to release my test program under GPL after I cleaned it
up a little. ab is not working for me, because it dies with "broken
pipe" when I try it over the network -- perhaps because they are doing
non-blocking I/O...

> The patch does not violate any locking requirements in the kernel, because
> the kerne lock could have been dropped at any time anyways when the copy_from_user
> slept to swap a page in.

I'd like to here some comments from other people on this. Is this a
proper patch or is it dangerous in any way ?

Linus, Alan, would you recommend to run a machine with that patch ?

> (I'm not sure if running a published benchmark with such a patch is fair though.

My intention is not to publish benchmark results and let others explain
them. I want to understand, what I'm measuring and why. For this, your
suggestion is excellent.

If I can even present a patch, that might help people, it's a lot better
than shouting out the latest "records".

> Another problem is that Linux 2.2 per default uses only 1GB of memory. This can

I've already patched that :-)

> Probably increasing the global file table size.
>
> Try:
>
> echo 32768 > /proc/sys/fs/file-max
> echo 65536 > /proc/sys/fs/inode-max

will do, asap

> Overall it should be clear that the current Linux kernel doesn't scale
> to 4CPUs for system load (user load is fine). I blame the Linux vendors
> for advertising it, although it is not true.

Thanks for the open statement.

> If you're interested I can send you a profiling patch that shows how much
> of the system CPU time is spent in locks. Another easy way is to boot
> with profile=2 and to run /usr/sbin/readprofile to see where the time is spent.

Yes, please send me patch. I'll try the other way, too.

> Work to fix all these problems is underway.

Will it come into 2.2 or 2.3 only ?

Thanks for your help, juergen

-- 
Juergen Schmidt   Redakteur/editor  c't magazin      PGP-Key available
Verlag Heinz Heise GmbH & Co KG, Helstorferstr. 7, D-30625 Hannover
EMail: ju@ct.heise.de - Tel.: +49 511 5352 300 - FAX: +49 511 5352 417

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/