Re: Interesting analysis of linux kernel threading by IBM

From: Peter Rival (frival@zk3.dec.com)
Date: Fri Jan 21 2000 - 15:04:40 EST


Larry McVoy wrote:

> : First of all, you're right.
>
> Ahhh, I love it when people suck up to me :-)
> Actually, smart move, it set my mind to actually reading the rest of your
> message. Everybody? Please start out all mail to me with "You're right".
>

Heh. And people think that I'm stupid/wimpy when I do things like that. While I do
agree with what you said, I knew that I would have to say so at the beginning without
any possible contradictions to get attention. Glad you have the patience to deal with
a youngin' like me (only 24 :).

>
> What? No? You don't think so? Ahh, well, I can hope :-)
>
> : Larry, you mentioned that you had thoughts on what Linux has to do to work on big
> : servers. I'm imagining, given your background and the content of this email, that
> : they are also ideas that won't harm the performance on small systems (e.g.
> : desktops). I must have missed them somehow - could you recap?
>
> OK, but they are pretty radical. Linus and I have talked them over and he
> has always been of the opinion "sounds good, sounds like it might be right,
> where's the code?". And I'm side tracked onto BitKeeper.
>

Sounds familiar. They keep wanting me to work on Tru64...<smirk>

>
> Whatever, can the people who are really interested in high performance
> take a look at
>
> http://www.bitmover.com/llnl/smp.{ps,pdf}
> and then
> http://www.bitmover.com/llnl/labs.{ps,pdf}
>

Interesting papers. A former IBMer coworker suggested something like that just the
other day from his mainframing days.

>
>
> That's kinda cute but not very useful because what you really want is to
> be able to have all processors working together on the same data with only
> one copy of the data. In other words, I don't care if I have one OS or
> 1000, I want all processors to be able to mmap /space/damn_big_file and
> poke at it. And I don't want any stinkin' DSM - I want real, hardware
> based coherency. Well, bucky, I'm here to tell ya, praise the lord,
> you can have it :-) You need to make an SMPFS which lets other OS's
> put reference counts on your inodes. The operation is extremely
> similar to what you have to do when you invalidate a page - you shoot
> down the other processor's TLB entries. So we need something like
> that in the reverse.
>

That shouldn't be too biggie of a deal. The question I'd think becomes one of
cross-CPU consistency traffic as with locks. We still write data which someone else
may or may not have mapped in their cache. Conceptually, tho', seeing as we would all
map the same memory (probably the same way) at some sort of offset (our per-N-cpu
kernel base), we should be able to touch memory that we share without extra cost.

How should we handle the entrance into the kernel from user space? I don't want each
user space process over on CPUs 28-32 having to bounce their request through the base
kernel located down at CPU 0. A simple loader trick, perhaps (kinda like we adjust our
shared library locations)? I'm somewhat talking out of my arse as I haven't thought
all of this completely through yet. (Then again, I'm accused of doing that anyway,
so... ;)

>
> If you think I'm waving my hands wildly, I am. But this is definitely
> doable, and as hard as it seems, it is easily an order of magnitude easier
> than threading the kernel to get to even 32 processors. I've lived
> through that twice, it's about a 7 year process (in hind sight; before
> hand, everyone said it would be maybe 18 months).
>
> Read the papers. Think. Think again. Let's talk. I can set up a
> perf@bitmover aliase if this becomes too off topic.
>

It sounds interesting. It will also be very interesting to work with this type of
setup on a NUMA machine where I believe this type of setup is pretty much requisite to
achieve any real performance (especially if the page replication/migration engines
aren't in the hardware). If more than just the two of us are interested, I'd recommend
setting up the list. I'd say a very cool project overall, if nothing else.

 - Pet

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:25 EST