Re: How should I take the "Intel inside" out ?

Chris (cdb@europa.dircon.co.uk)
Tue, 22 Jul 1997 11:26:48 +0100 (BST)


On Tue, 22 Jul 1997, Meino Christian Cramer wrote:

> My kernel relevant (or isn't it?) question is:
> UNIX and therefore LINUX are multitasking
> operation systems. For me it means:
> >From the sight of the processor, the
> code to be excuted is switched many
> times, so that the contents of the
> L1/L2 and processor pipes are becoming
> invalid very often and are therefore
> the caches are"""needless""".
> ^^^ ^^^---<<<watch this!

This turns out not to be the case. (polite way of saying 'bullshit')

Linux might perform a context switch every 10ms. In 10ms a 512k L2
cache can be filled and read many times. In a case where you have one
process using the majority of the CPU time any argument that caches
are needless becomes weaker.

> What is worth more:
> To have a board, which has 512KByte
> 2nd-level cache but "only" 835MByte/s
> throughput to the L1-cache _OR_ using
> a board, which has 876MByte/s throughput
> (L1-Cache) but only 256KByte L2-cache.

This is an interesting question. The answer is almost certainly "it
depends on what you are doing."

If a particular loop happens to be too big to run from 256k cache
but would have run from 512k cache the 256k solution is going to
be horrible. On the other hand, you always lose a bit if you choose
the slower board with 512k cache.

My instinct would be to use the 512k cache.

> (The tests and benchmarks I know are based on
> tests with that nasty Windoze-stuff, so
> only the physical measurable values
> as those above are relevant to linux,
> I think...)
>
> I have no idea, what the relation is
> between the timings of the context switches
> and the scheduling cycles and the times
> in which L1/L2-caches are used.
>
> Has someone any experiences with boards
> for the AMD K6???

For what it's worth, one of my machines is running a K6 at 200MHz.
The motherboard is a Soyo, based on the HX chipset, with 512k cache
and 128M EDO ram. 398.13 BogoMips.

lmbench results:

L M B E N C H 1 . 0 S U M M A R Y
------------------------------------

Processor, Processes - times in microseconds
--------------------------------------------
Host OS Mhz Null Null Simple /bin/sh Mmap 2-proc 8-proc
Syscall Process Process Process lat ctxsw ctxsw
--------- ------------- ---- ------- ------- ------- ------- ---- ------ ------
daylight Linux 2.0.29 200 2 1K 6K 17K 37 6 9

*Local* Communication latencies in microseconds
-----------------------------------------------
Host OS Pipe UDP RPC/ TCP RPC/
UDP TCP
--------- ------------- ------- ------- ------- ------- -------
daylight Linux 2.0.29 27 82 199 114 232

*Local* Communication bandwidths in megabytes/second
----------------------------------------------------
Host OS Pipe TCP File Mmap Bcopy Bcopy Mem Mem
reread reread (libc) (hand) read write
--------- ------------- ---- ---- ------ ------ ------ ------ ---- -----
daylight Linux 2.0.29 33 16 30 73 30 30 83 51

Memory latencies in nanoseconds
(WARNING - may not be correct, check graphs)
--------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
daylight Linux 2.0.29 200 5 201 348

-- 
Chris Butterworth