New Performance (+Memory saver) patch

Mark Hemment (markhe@nextd.demon.co.uk)
Thu, 10 Apr 1997 18:31:35 +0100 (BST)


Hi David/All,

Anyway, I've found why lmbench wasn't giving good figures for my previous
performance patch; I wasn't colouring the slabs (c_align was _always_ 0 -
so there were too many cache-line collisions), and the h/w alignment of
the general-caches wasn't great.

The new patch on my home-page (http://www.nextd.demon.co.uk) solves the
h/w cache problems, and an nfs-compile error.

It also includes some late nite munching, namely the per-task file
descriptors are now allocacted in blocks (of 32). This code needs a few
performance improvements (such as ensuring the fd-block size is the same
as __NFDBITS to help speed up copy_files()), but it seems to work well
(although I haven't done much testing with it - bewarned!).

Kernel compile times, on a 486/8MB test target show a consistent
improvement.

The lmbench figures, see below ("next" is an unpatch kernel, "slab" is
a patched kernel, both run on the same Pentium box), are interesting
but I believe we can do better for networking....

The process-overhead, and particularly the context-switching overhead,
have dropped nicely. The pipe bandwidth is up because of the faster
switching. The Bcopy bandwidth improvement (slab.1.20) is strange - I
only ever see 46 on a patched kernel.... The improvment in mmap-reread is
nice.
Note, the second and third figures for "File reread" under a patch kernel.
Something strange must have happened (the fourth run was back to normal,
but I didn't want to massage the figures).

--------------------------------------------------------------------------------
L M B E N C H 1 . 0 S U M M A R Y
------------------------------------

Processor, Processes - times in microseconds
--------------------------------------------
Host OS Mhz Null Null Simple /bin/sh Mmap 2-proc 8-proc
Syscall Process Process Process lat ctxsw ctxsw
--------- ------------- ---- ------- ------- ------- ------- ---- ------ ------
nextd.17 Linux 2.1.32 133 1 1K 6K 24K 50 6 10
nextd.18 Linux 2.1.32 133 1 1K 6K 22K 50 6 10
nextd.19 Linux 2.1.32 133 1 1K 6K 21K 49 6 10

slab.20 Linux 2.1.32 133 1 1K 6K 21K 49 6 9
slab.21 Linux 2.1.32 133 1 1K 6K 20K 50 6 8
slab.22 Linux 2.1.32 133 1 1K 6K 20K 49 5 9

*Local* Communication latencies in microseconds
-----------------------------------------------
Host OS Pipe UDP RPC/ TCP RPC/
UDP TCP
--------- ------------- ------- ------- ------- ------- -------
nextd.17 Linux 2.1.32 22 150 300 223 498
nextd.18 Linux 2.1.32 23 147 304 237 443
nextd.19 Linux 2.1.32 23 145 305 219 441

slab.20 Linux 2.1.32 24 135 295 215 441
slab.21 Linux 2.1.32 24 135 306 243 447
slab.22 Linux 2.1.32 24 137 294 211 445

*Local* Communication bandwidths in megabytes/second
----------------------------------------------------
Host OS Pipe TCP File Mmap Bcopy Bcopy Mem Mem
reread reread (libc) (hand) read write
--------- ------------- ---- ---- ------ ------ ------ ------ ---- -----
nextd.17 Linux 2.1.32 41 19 44 76 45 43 86 84
nextd.18 Linux 2.1.32 40 18 44 76 45 43 86 84
nextd.19 Linux 2.1.32 40 18 44 76 45 43 86 84

slab.20 Linux 2.1.32 41 19 44 76 46 43 86 84
slab.21 Linux 2.1.32 41 19 40 77 45 43 86 84
slab.22 Linux 2.1.32 41 20 40 77 45 43 86 84
----------------------------------------------------------------------------

Regards,

markhe

------------------------------------------------------------------
Mark Hemment, Unix/C Software Engineer (Contractor)
markhe@nextd.demon.co.uk http://www.nextd.demon.co.uk/
"Success has many fathers, failure is a B**TARD!" - anon
------------------------------------------------------------------