Re: [PATCH 2 of 4] Introduce i386 fibril scheduling
From: Alan
Date: Fri Feb 02 2007 - 14:48:33 EST
This one got shelved while I sorted other things out as it warranted a
longer look. Some comments follow, but firstly can we please bury this
"fibril" name. The constructs Zach is using appear to be identical to
co-routines, and they've been called that in computer science literature
for fifty years. They are one of the great and somehow forgotten ideas.
(and I admit I've used them extensively in past things where its
wonderful for multi-player gaming so I'm a convert already).
The stuff however isn't as free as you make out. Current kernel logic
knows about various things being "safe" but with fibrils you have to
address additional questions such as "What happens if I issue an I/O and
change priority". You also have an 800lb gorilla hiding behind a tree
waiting for you in priviledge and permission checking.
Right now current->*u/gid is safe across a syscall start to end, with an
asynchronous setuid all hell breaks loose. I'm not saying we shouldn't do
this, in fact we'd be able to do some of the utterly moronic poxix thread
uid handling in kernel space if we did, just that it isn't free. We have
locking rules defined by the magic serializing construct called
"the syscall" and you break those.
I'd expect the odd other gorilla waiting to mug you as well and the ones
nobody has thought of will be the worst 8)
The number of co-routines and stacks can be dealt with two ways - you use
small stacks allocated when you create a fibril, or you grab a page, use
separate IRQ stacks and either fail creation with -ENOBUFS etc which
drops work on user space, or block (for which cases ??) which also means
an overhead on co-routine exits. That can be tunable, for embedded easily
tuned right down.
Traditional co-routines have clear notions of being able to create a
co-routine, stack them and fire up specific ones. In part this is done
because many things expressed in this way know what to fire up next. It's
also a very clean way to express driver problem with a lot of state
Essentially as a co-routine is simply making "%esp" roughly the same as
the C++ world's "self".
You get some other funny things from co-routines which are very powerful,
very dangerous, or plain insane depending upon your view of life. One big
one is the ability for real men (and women) to do stuff like this,
because you don't need to keep the context attached to the same task.
send_reset_command(dev);
wait_for_irq_event(dev->irq);
/* co-routine continues in IRQ context here */
clean_up_reset_command(dev);
exit_irq_event();
/* co-routine continues out of IRQ context here */
send_identify_command(dev);
Notice we just dealt with all the IRQ stack problems the moment an IRQ is
a co-routine transfer 8)
Ditto with timers, although for the kernel that might not be smart as we
have a lot of timers.
Less insanely you can create a context, start doing stuff in it and then
pass it to someone else local variables, state and all. This one is
actually rather useful for avoiding a lot of the 'D' state crap in the
kernel.
For example we have driver code that sleeps uninterruptibly because its
too hard to undo the mess and get out of the current state if it is
interrupted. In the world of sending other people co-routines you just do
this
coroutine_set(MUST_COMPLETE);
and in exit
foreach(coroutine)
if(coroutine->flags & MUST_COMPLETE)
inherit_coroutine(init, coroutine);
and obviously you don't pass any over that will then not do the right
thing before accessing user space (well unless implementing
'read_for_someone_else()' or other strange syscalls - like ptrace...)
Other questions really relate to the scheduling - Zach do you intend
schedule_fibrils() to be a call code would make or just from schedule() ?
Linus will now tell me I'm out of my tree...
Alan (who used to use Co-routines in real languages on 36bit
computers with 9bit bytes before learning C)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/