Re: RFC for a new Scheduling policy/class in the Linux-kernel

From: Ted Baker
Date: Thu Jul 16 2009 - 19:38:47 EST


On Thu, Jul 16, 2009 at 05:34:25PM -0500, Karthik Singaram Lakshmanan wrote:

> Just chiming in that from an implementation perspective, we could use
> a priority bitmap of active tasks contending for the lock. An
> implementation similar to the one used by the O(1) scheduler can be of
> great use here. Hardware support like "find_first_bit" can drastically
> reduce the time taken to search for the highest-priority task pending
> on the lock. Given realistic values for the number of distinct
> priority values required by most practical systems, such an
> implementation could prove effective.

This does not solve the problem of avoiding queue reordering in
response to dynamic priority changes, since where you insert the
task in the queue (including the bit setting for the priority and
the linking/unlinking) depends on the curent priority.

This queue reordering is a huge pain, since you need to do it not
only whenever a priority is explicitly changed by the user; you
need to do it whenever a task gains or loses priority through
inheritance. The latter can happen asynchronously, in response to
a time-out or signal, for example.

By the way, the use of bit-maps is very appealing for the O(1)
operations, including bit-scan, especially if your machine has
atomic set/test bit instructions. We used this structure for some
single-processor RT kernels in the 1980's. The task scheduling
and sleep/wakeup operations were just a few instructions. However
the use of bit-maps forced us to set a fixed limit on the number
of tasks in the system. We also could not change priorities
without doing an ugly (non-atomic) re-shuffling of the structures.

You are proposing one bit per priority level, of course, rather
than one bit per task. This allows you to use linked lists within
priorities, but you lose the one-instruction suspend/unsuspend.

It is not immediately obvious how to extend this technique to
deadline-based scheduling.

There can only be a finite number of distinct deadlines in a
system at any one time. So at any point in time a finite number
of bits is sufficient. The problem is that the deadlines are
advancing, so a fixed mapping of bit positions to deadlines does
not work.

There is one way this can be used with EDF, at least for a single
processor, which is related to the way I came up with the SRP. If
you keep a stack of preempting tasks (earliest deadline on top),
the positions in the stack do map nicely to a bit vector.

You then need some other structure for the tasks with deadlines
later than the bottom of your stack.

I don't see how this generalizes to SMP.

Ted



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/