Re: [PATCH 2 of 4] Introduce i386 fibril scheduling
From: Ingo Molnar
Date: Fri Feb 02 2007 - 19:03:56 EST
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> THAT'S THE POINT. That's what makes fibrils cooperative. The "only if
> you block" is really what makes a fibril be something else than a
> regular thread.
Well, in my picture, 'only if you block' is a pure thread utilization
decision: bounce a piece of work to another thread if this thread cannot
complete it. (if the kernel is lucky enough that the user context told
it "it's fine to do that".)
it is 'incidental parallelism' instead of 'intentional parallelism', but
the random and unpredictable nature of it doesnt change anything about
the fundamental fact: we start a new thread of execution in essence.
Typically it will be rare in a workload as it will be driven by
cachemisses, but for example in DB workloads the 'cachemiss' will be the
/common case/ - because the DB manages the cache itself.
And how to run a thread of execution is a fundamental /scheduling/
decision: it is the acceptance of and the adoption to the cost of work
migration - if no forced wait happens then often it's cheaper to execute
all work locally and serially.
[ in fact, such a mechanism doesnt even always have to be driven from
the scheduler itself: such a 'bounce current work to another thread'
event could occur when we detect that a pagecache page is missing and
that we have to do a ->readpage, etc. Tux does that since 1999: the
cutoff for 'bounce work' was when a soft cache (the pagecache or the
dentry cache) was missed - not when we went into the IO path. This has
the advantage that the Tux cachemiss threads could do /all/ the IO
preparation and IO completion on the same CPU and in one go - while
the user context was able to continue executing. ]
But this is also a function of hardware: for example on a Transputer i'd
bounce off all such work immediately (even if it's a sys_time()
syscall), all the time, even if fully cached, no exceptions, because the
hardware is such that another CPU can pick it up in the next cycle.
while we definitely dont want to bounce short-lived cached syscalls to
another thread, for longer ones or ones which we /expect/ to block we
might want to do it like that straight away. [Especially on a multi-core
CPU that has a shared L2 cache (and doubly so on a HT/SMT CPU that has a
shared L1 cache).]
i dont see anything here that mandates (or even strongly supports) the
notion of cooperative scheduling. The moment a context sees a 'cache
miss', it is totally fair to potentially distribute it to other CPUs. It
wont run for a long time and it will be totally cache-cold when the 'IO
done' event occurs - hence we should schedule it where the IO event
occured. Which might easily be the same CPU where the user context is
running right now (we prefer last-run CPUs on wakeups), but not
necessarily - it's a general scheduling decision.
> > Note: such a 'flip' would only occur when the original context
> > blocks, /not/ on every async syscall.
>
> Right.
>
> So can you take a look at Zach's fibril idea again? Because that's
> exactly what it does. It basically sets a flag, saying "flip to this
> when you block or yield". Of course, it's a bit bigger than just a
> flag, since it needs to describe what to flip to, but that's the basic
> idea.
i know Zach's code ... i really do. Even if i didnt look at the code
(which i did), Jonathon Corbet did a very nice writeup about fibrils on
LWN.net two days ago, which i've read as well:
http://lwn.net/Articles/219954/
So there's no misunderstanding on my side i think.
> Now, if you want to make fibrils *also* then actually use a separate
> thread, that's an extension.
oh please, Linus. I /did/ suggest this as an extension to Zach's idea!
Look at the Subject line - i'm reacting to the specific fibril code of
Zach. I wrote this:
| as per my other email, i dont really like this concept. This is the
| killer:
|
| > [...] There can be multiple of them in the process of executing for
| > a given task_struct, but only one can every be actively running at a
| > time. [...]
|
| there's almost no scheduling cost from being able to arbitrarily
| schedule a kernel thread - but there are /huge/ benefits in it.
|
| would it be hard to redo your AIO patches based on a pool of plain
| simple kernel threads?
see http://lkml.org/lkml/2007/2/1/40.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/