Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

From: Linus Torvalds
Date: Fri Feb 02 2007 - 10:57:51 EST

Next message: Cedric Le Goater: "Re: 2.6.20-rc6-mm3"
Previous message: Paul E. McKenney: "Re: [RFC PATCH -rt 1/2] RCU priority boosting that survives vicious testing"
In reply to: Ingo Molnar: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Next in thread: Alan: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2 Feb 2007, Ingo Molnar wrote:
>
> My only suggestion was to have a couple of transparent kernel threads
> (not fibrils) attached to a user context that does asynchronous
> syscalls! Those kernel threads would be 'switched in' if the current
> user-space thread blocks - so instead of having to 'create' any of them
> - the fast path would be to /switch/ them to under the current
> user-space, so that user-space processing can continue under that other
> thread!

But in that case, you really do end up with "fibrils" anyway.

Because those fibrils are what would be the state for the blocked system
calls when they aren't scheduled.

We may have a few hundred thousand system calls a second (maybe that's not
actually reasonable, but it should be what we *aim* for), and 99% of them
will hopefully hit the cache and never need any separate IO, but even if
it's just 1%, we're talking about thousands of threads.

I do _not_ think that it's reasonable to have thousands of threads state
around just "in case". Especially if all those threadlets are then
involved in signals etc - something that they are totally uninterested in.

I think it's a lot more reasonable to have just the kernel stack page for
"this was where I was when I blocked". IOW, a fibril-like thing. You need
some data structure to set up the state *before* you start doing any
threads at all, because hopefully the operation will be totally
synchronous, and no secondary thread is ever really needed!

What I like about fibrils is that they should be able to handle the cached
case well: the case where no "real" scheduling (just the fibril stack
switches) takes place.

Now, most traditional DB loads would tend to use AIO only when they "know"
that real IO will take place (the AIO call itself will probably be
O_DIRECT most of the time). So I suspect that a lot of those users will
never really have the cached case, but one of my hopes is to be able to do
exactly the things that we have *not* done well: asynchronous file opens
and pathname lookups, which is very common in a file server.

If done *really* right, a perfectly normal app could do things like
asynchronous stat() calls to fill in the readdir results. In other words,
what *I* would like to see is the ability to have something *really*
simple like "ls" use this, without it actually being a performance hit
for the common case where everythign is cached.

Have you done "ls -l" on a big uncached directory where the inodes
are all over the disk lately? You can hear the disk whirr. THAT is the
kind of "normal user" thing I'd like to be able to fix, and the db case is
actually secondary. The DB case is much much more limited (ok, so somebody
pointed out that they want slightly more than just read/write, but
still.. We're talking "special code".)

> [ finally, i think you totally ignored my main argument, state machines.

I ignored your argument, because it's not really relevant. The fact that
networking (and TCP in particular) has state machines is because it is a
packetized environment. Nothing else is. Think pathname lookup etc. They
are all *fundamentally* environments with a call stack.

So the state machine argument is totally bogus - it results in a
programming model that simply doesn't match the *normal* setup. You want
the kernel programming model to appear "linear" even when it isn't,
because it's too damn hard to think nonlinearly.

Yes, we could do pathname lookup with that kind of insane setup too. But
it would be HORRID!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Cedric Le Goater: "Re: 2.6.20-rc6-mm3"
Previous message: Paul E. McKenney: "Re: [RFC PATCH -rt 1/2] RCU priority boosting that survives vicious testing"
In reply to: Ingo Molnar: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Next in thread: Alan: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]