Re: sched_yield proposals/rationale

From: William Lee Irwin III
Date: Thu Apr 12 2007 - 18:00:24 EST


On Thu, Apr 12, 2007 at 11:27:22PM +1000, Nick Piggin wrote:
> This one should be pretty rare (actually I think it is dead code in
> practice, due to the way the page allocator works).
> Avoiding sched_yield is a really good idea outside realtime scheduling.
> Since we have gone this far with the current semantics, I think it
> would be sad to back down now.
> It would be nice if you could pressure those other components to adapt :)

Outside of realtime scheduling there appear to be two desired behaviors:
(1) busywait: demote as aggressively as possible
(2) CPU burn: give other apps a chance to run but demote lightly at most

There is no way for the scheduler to distinguish which of the two
behaviors is desired. A fresh system call taking an argument to describe
which is the desired behavior is my recommended solution. Most unaware
apps should be able to be dealt with via LD_PRELOAD.

Busywaiters able to be modified could be given more specific scheduling
primitives, in particular "directed yields," which donate timeslice and
possibly dynamic priority to their targets. They would look something
like:
int yield_to(pid_t);
int yield_to_futex(int *);
int yield_to_sem(int);
/* etc. */
as userspace library functions where yielding to a resource is intended
to donate timeslice to its owner or one of its owners, where those
owner(s) are to be determined by the kernel. Directed yields are a more
direct attack on the priority inversion one most desperately wants to
avoid in the case of sched_yield() -based busywaiters on a resource,
namely the resource owner falling behind the busywaiters in priority or
running out of timeslice. They furthermore reduce the competition for
CPU between resource owners and busywaiters on that resource.

A less direct alternative suggested by Andi Kleen is to have coprocess
groups and an alternative to sched_yield() that directs yielding toward
a member of the same coprocess group as the yielder, possibly using the
standard system call by making that the default behavior when a process
is a member of such a coprocess group.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/