Re: SMP Theory (was: Re: Interesting analysis of linux kernel threading by IBM)

From: Davide Libenzi (davidel@maticad.it)
Date: Tue Jan 25 2000 - 03:43:17 EST


----- Original Message -----
From: <dg50@daimlerchrysler.com>
To: <linux-kernel@vger.rutgers.edu>
Sent: Monday, January 24, 2000 11:46 PM
Subject: SMP Theory (was: Re: Interesting analysis of linux kernel threading
by IBM)

> I've been reading the SMP thread and this is a truly educational and
> fascinating discussion. How SMP works, and how much of a benefit it
> provides has always been a bit of a mystery to me - and I think the light
> is slowly coming on.
>
> But I have a couple of (perhaps dumb) questions.
>
> OK, if you have an n-way SMP box, then you have n processors with n
(local)
> caches sharing a single block of main system memory. If you then run a
> threaded program (like a renderer) with a thread per processor, you wind
up
> with n threads all looking at a single block of shared memory - right?
>
> OK, if a thread accesses (I assume writes, reading isn't destructive, is
> it?) a memory location that another processor is "interested" in, then
> you've invalidated that processor's local cache - so it has to be flushed
> and refreshed. Have enough cross-talk between threads, and you can achieve
> the worst-case scenario where every memory access flushes the cache of
> every processor, totally defeating the purpose of the cache, and perhaps
> even adding nontrivial cache-flushing overhead.
>
> If this is indeed the case (please correct any misconceptions I have) then
> it strikes me that perhaps the hardware design of SMP is broken. That
> instead of sharing main memory, each processor should have it's own main
> memory. You connect the various main memory chunks to the "primary" CPU
via
> some sort of very wide, very fast memory bus, and then when you spawn a
> thread, you instead do something more like a fork - copy the relevent
> process and data to the child cpu's private main memory (perhaps via some
> sort of blitter) over this bus, and then let that CPU go play in its own
> sandbox for a while.
>
> Which really is more like the "array of uni-processor boxen joined by a
> network" model than it is current SMP - just with a REALLY fast&wide
> network pipe that just happens to be in the same physical box.

My personal point of view is that the winner technology will be the one that
exposes __very__ __raw__ instruction sets that can be programmed in a
software
way ( the way partially chosen by the crusoe of Transmeta, IA64 of Intel and
mostly
the raw computing technology ).
Into the __very__ __raw__ instruction sets I'd like to be included also the
ones that control
cache invalidations and prefetching methods.
Higher one can think the level of intelligence stored into the CPU for
choosing these strategies,
only the application writer ( or OS writer ) can give a fine tuning of its.

Davide.

--
All this stuff is IMVHO

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:14 EST