lmbench on Linux/alpha

Linus Torvalds (Linus.Torvalds@cs.helsinki.fi)
Sun, 29 Oct 1995 21:27:33 +0200


You may or may not be aware of the fact that Larry McVoy (at SGI) has
done a benchmark program to test various hardware information and
operating system efficiency. I've used that before (back when I
optimized the linux context switching in the early 1.3.x times), and now
that Larry announced a new version I also tested it on my Alpha.

The benchmark showed a few anomalies, which were due to the current
linux alpha source tree not using the optimized memcpy(), even though it
had actually been written. Instead, the kernel did all memcpy's with
the braindead portable version in lib/string.c (doing copies a byte at a
time, ugh). I've fixed that now, so you can look forward to a better
1.3.38.

For your reading pleasure, here are the lmbench results, along with some
numbers from other alphas running OSF/1. The OSF/1 numbers are numbers
that were included with the benchmark, and I can't really say what kinds
of machines they are: the linux machine (pc64) is a 32MB Cabriolet
running at 275MHz (there are two numbers: pc64 and pc64.1 - they are the
same machine and the same benchmarks and differences are just due to
timing changes).

Generally, linux wins on process handling and on pipe latency and
throughput (even when compared to faster the 300MHz 8400 machine).
Linux also does well on UDP, being slightly better at both throughput
and latency.

Linux loses on TCP throughput and latency, though: this is not an
alpha-specific thing, but shows up on Linux/x86 too (in fact, it shows
up very clearly on Linux/x86, even more so than on an alpha).

Linux also falls behind on file re-read and mmap throughput. Also, the
8400 and the "nobozo" machine (whatever that is: it doesn't have a very
fast processor, but it has the best memory subsystem of the whole lot)
have a better memory subsystem, so they get better numbers for Mem
read/write etc. That's not an OS issue, though.

The "linux.cs" numbers are there for you to compare against my 100MHz
Pentium at the university: it isn't best at any of these benchmarks, but
you can use it to get a feel fro the numbers (again, it shows up twice,
because I ran the timings twice).

DISCLAIMER: I've used different kernel versions for this even on the
linux machines (1.3.37 on the Pentium/100 and a pre-1.3.38 with the
memcpy stuff fixed on the Alpha). And I have no idea how reliable the
OSF/1 numbers are: I don't know where Larry got them from (they don't
have the bcopy numbers for the 8400 at all, for example). But it does
look like linux on the alpha at least doesn't need to be ashamed of
itself.

Linus

----------

Guide to reading the lmbench numbers:
- this is the "best of breed" summary, which always shows the best
numbers starred. Starred numbers are either MB/s (in throughput) or
microseconds (in OS latency) or nanoseconds (for memory and TLB
latencies).
- non-starred numbers are either "factor worse" (ie 4.6 means something
was 4.6 times slower than the fastest) or "percent of fastest" (ie
38% means that this was 38% of the throughput of the best system in
the comparison).

----------
L M B E N C H 1 . 0 S U M M A R Y
------------------------------------

Comparison to best of the breed
-------------------------------

(Best numbers are starred, i.e., *123)

Processor, Processes - factor slower than the best
--------------------------------------------------
Host OS Mhz Null Null Simple /bin/sh Mmap 2-proc 8-proc
Syscall Process Process Process lat ctxsw ctxsw
--------- ------------- ---- ------- ------- ------- ------- ---- ------ ------
8400-32.p OSF1 V3.2 303 4.5 2.7 2.4 1.3 13 1.4 1.7
alpha OSF1 V2.1 182 6.5 6.5 6.6 3.5 11 2.5 3.2
nobozo OSF1 V3.2 196 4.5 4.2 4.2 2.2 11 4.3 3.5
pc64 Linux 1.3.38 275 *2 *0.7K 1.2 1.0 *15 1.1 *13
pc64.1 Linux 1.3.38 275 1.5 1.4 *2.4K *12.1K *15 *10 *13
linux.cs. Linux 1.3.37 100 1.5 2.5 5.3 3.7 6.3 2.3 2.3
linux.cs. Linux 1.3.37 100 1.5 2.2 5.0 3.6 5.5 1.6 1.7

*Local* Communication latencies - factor slower than the best
-------------------------------------------------------------
Host OS Pipe UDP RPC/ TCP RPC/
UDP TCP
--------- ------------- ------- ------- ------- ------- -------
8400-32.p OSF1 V3.2 2.1 1.4 1.1 *267 *371
alpha OSF1 V2.1 5.4 2.2 2.3 1.6 2.3
nobozo OSF1 V3.2 4.1 2.1 2.3 1.5 1.8
pc64 Linux 1.3.38 *34 *180 *317 1.6 1.6
pc64.1 Linux 1.3.38 *34 1.1 1.0 1.6 1.7
linux.cs. Linux 1.3.37 1.7 1.7 1.9 4.5 4.3
linux.cs. Linux 1.3.37 1.2 1.6 1.7 4.3 3.9

*Local* Communication bandwidths - percentage of the best
---------------------------------------------------------
Host OS Pipe TCP File Mmap Bcopy Bcopy Mem Mem
reread reread (libc) (hand) read write
--------- ------------- ---- ---- ------ ------ ------ ------ ---- -----
8400-32.p OSF1 V3.2 63% 92% *67 *78 0% 0% *120 *123
alpha OSF1 V2.1 43% *12 58% 29% 84% 89% 62% 63%
nobozo OSF1 V3.2 73% 86% 72% 32% *45 *45 73% 73%
pc64 Linux 1.3.38 99% 73% 36% 22% 85% 85% 60% 57%
pc64.1 Linux 1.3.38 *73 73% 37% 28% 86% 86% 61% 58%
linux.cs. Linux 1.3.37 29% 25% 25% 17% 68% 65% 50% 40%
linux.cs. Linux 1.3.37 32% 26% 25% 17% 67% 64% 50% 40%

Memory latencies in nanoseconds - factor slower than the best
(WARNING - may not be correct, check graphs)
-------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem TLB Guesses
--------- ------------- --- ---- ---- -------- --- -------
8400-32.p OSF1 V3.2 302 *3 *42 1.4 1.0
alpha OSF1 V2.1 182 3.3 1.3 1.1 1.2
nobozo OSF1 V3.2 196 3.0 1.2 *288 1.1
pc64 Linux 1.3.38 275 *3 2.0 1.2 *393
pc64.1 Linux 1.3.38 275 *3 1.0 1.3 1.0
linux.cs. Linux 1.3.37 99 3.3 3.8 1.1 1.9
linux.cs. Linux 1.3.37 99 3.3 4.7 1.1 1.9