Re: RFC for a new Scheduling policy/class in the Linux-kernel

From: Chris Friesen
Date: Thu Jul 16 2009 - 18:09:39 EST


Ted Baker wrote:
> On Thu, Jul 16, 2009 at 09:17:32AM -0600, Chris Friesen wrote:
>
>> If a high-priority task A makes a syscall that requires a lock currently
>> held by a sleeping low-priority task C, and there is a medium priority B
>> task that wants to run, the classic scenario for priority inversion has
>> been achieved.
>
> I think you don't really mean "sleeping" low-priority task C,
> since then the priority inheritance would do no good. I guess you
> mean that C has been/is preempted by B (and for global SMP, there
> is some other medicum priority task B' that is eligible to run on
> A's processor). That could be a priority inversion scenario.

My terminology is getting sloppy. Yes, I meant preempted.

>> I suspect there are other similar cases where deadlock is the real
>> issue, and hard realtime isn't a concern (but low latency may be
>> desirable). PI is simple to enable and doesn't require any thought on
>> the part of the app writer.
>
> I'm confused by your reference to deadlock. Priority inheritance
> does not prevent deadlock, even on a single processor.

Sloppy terminology again. Priority inversion. If all apps are
soft-realtime and B is a pure cpu hog (which can effectively happen on
heavily loaded server systems) then A will never get to run.

>> At least for POSIX, both PI and PP mutexes can suspend while the lock is
>> held. From the user's point of view, the only difference between the
>> two is that PP bumps the lock holder's priority always, while PI bumps
>> the priority only if/when necessary.
>
> You are right that POSIX missed the point of priority ceilings,
> by allowing suspension.

The vast majority of apps written for Linux are POSIX apps, so for this
discussion we need to bear that behaviour in mind.

> However, there is still a difference in context-switching
> overhead. Worst-case, you have twice as many context switches
> per critical section with PIP as with PP.

On the other hand, with PI the uncontended case can be implemented as
atomic operations in userspace. With PP we need to issue at least two
syscalls per lock/unlock cycle even in the uncontended case (to handle
the priority manipulations).

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/