Re: RFC for a new Scheduling policy/class in the Linux-kernel

From: Ted Baker
Date: Thu Jul 16 2009 - 19:54:39 EST


On Thu, Jul 16, 2009 at 04:08:47PM -0600, Chris Friesen wrote:

> > However, there is still a difference in context-switching
> > overhead. Worst-case, you have twice as many context switches
> > per critical section with PIP as with PP.
>
> On the other hand, with PI the uncontended case can be implemented as
> atomic operations in userspace. With PP we need to issue at least two
> syscalls per lock/unlock cycle even in the uncontended case (to handle
> the priority manipulations).

Needing syscalls to change the priority of a thread may be an
artifact of system design, that might be correctable.

Suppose you put the effective priority of each thread in a
per-thread page that is mapped into a fixed location in the
thread's address space (and different locations in the kernel
memory). It is nice to have such a page for each thread
in any case. I don't recall whether Linux already does this,
but it is a well proven technique.

Taking a PP lock then involves:

1) push old priority on the thread's stack
2) overwrite thread's priority with max of the lock priority
and the thread priority
3) try to grab the lock (test-and-set, etc.)
... conditionally queue, etc.

Releasing the PP lock then involves:

1) conditionally find another thread to grant the lock to,
call scheduler, etc., otherwise
2) give up the actual lock (set bit, etc.)
3) pop the old priority from the stack, and
write it back into the per-thread location

Of course you also have explicit priority changes. The way we
handled those was to defer the effect until the lock release
point. This means keeping two priority values (the nominal one,
and the effective one). Just as you need conditional
code to do the ugly stuff that requires a kernel trap
in the case that the lock release requires unblocking
a task, you need conditional code to copy the copy the
new nominal priority to the effective priority, if that
is called for. We were able to combine these two conditions
into a single bit test, which then branched out to handle
each of the cases, if necessary.

I can't swerar there are nocomplexities in Linux that might break
this scheme, since we were not trying to support all the
functionality now in Linux.

Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/