Re: [PATCH] O12.2int for interactivity

From: Timothy Miller
Date: Thu Aug 14 2003 - 14:47:04 EST


Thanks again for your explanations. I hope my questions don't eventually get on your nerves. :)

William Lee Irwin III wrote:

The "incremental" business meant that tasks have their timeslices
for the next epoch as they are moved onto the expired queue. In the
2.4.x scheduler this was not incremental, so when the epoch expired,
every task on the system had its timeslice recomputed in one large
(and time-consuming) loop. The 2.4.x case was also moderately wasteful
as it recomputed priorities for tasks regardless of whether they were
runnable, which IMHO is a flaw for which corrections are 2.4.x-mergeable.

Ok, so aside from the waste, the difference is that 2.4 and 2.6 do the similar amounts of work to compute timeslices, but 2.4 did it all at once, while 2.6 does it one at a time?



William Lee Irwin III wrote:


Blocked tasks are taken off the runqueues entirely, and are found by
device drivers etc. by putting them on a waitqueue so whenever whatever
they're blocked on happens, they can be found and put on the runqueue.

Ah... so, say a driver has a bug where it forgets to wake up a task. Is this what people are talking about when they complain about tasks being stuck in the "D" state? What does the "D" stand for?



William Lee Irwin III wrote:

When the epoch expires/active queue
empties, the two queues exchange places. The net effect is that the
duelling queues end up representing a circular list maintained in some
special order that can have insertions done into the middle of it for
tasks designated as "interactive", and that priority preemption's
effectiveness is unclear.


On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:

And by "middle", do you mean at the end of the active queue which is the "middle" if you think of the expired queue as being a continuation of the active queue?


The "middle" is some arbitrary position in this list that makes it
neither the last nor the first of the currently runnable tasks to be
run after the currently running task. The above analogy removes the
distinctions between active and expired queues, so it doesn't make
sense to ask about the boundary between the two.


Are you saying that the boundary CAN be conceptually removed, or are you saying that by attempting to do that, I am demonstrating a misunderstanding of the point in having the two separate queues?

Ok, so beyond the prioritization based on dynamic priority, it seems that you have even finer-grained priorities which affect where in the queue for a given priority a task is placed once it expires


You talk about priorities being recomputed when a task expires. Does vanilla do that just on timeslice expiration, while intsched does it also when a task blocks on I/O?


How many dynamic priority levels are there?



To rehash, consider changing the representation of the scheduler's
internals from two arrays to a single doubly-linked list (obviously
it's not useful to implement this, but it is conceptually useful).
Movement from the active queue to the expired queue would then be
modeled by simply list_move_tail(&task->run_list, &run_list).
Recycling interactive tasks to the active list or inserting newly
woken tasks at particular priorities would involve hunting for a
special position in the list and doing list_add() to there. Timeslices
are entirely orthogonal; they control when list movement is done, but
not how it's done.

Ok, so timeslice length for a task is yet another level of prioritization then? It seems we have a list of different things that affect when and how tasks are run that may be related to "priority" in a multidimentional way.

For active tasks:

- timeslice length (how long it's run)
- dynamic priority (when it's run relative to other priorities)
- position in dynamic priority list (when it's run relative to others at same priority)

For expired tasks:

- being on the expired queue
- position in the expired queue
- something else I don't know about


This hasn't been entirely clear to me, but is it correct that intsched, over vanilla, affects more of these things than vanilla (ie. maybe vanilla doesn't put active tasks in the middle of a queue?) and does so by paying attention to more about what the task does (ie. vanilla only recomputes dynamic priority based on expiration and doesn't put expired tasks into the active array?) ?



In this way, we can see that the priorities merely control how tasks
are jumbled together into the list because the above alternative
representation faithfully models the current arrangement. We're
essentially cycling through this circular list and have to struggle
to recover preferences for interactive tasks by fiddling with
timeslices and the like, as otherwise such a circular queue would be
completely fair, to the detriment of interactive tasks.

Hmmm... I'm confused. I was under the impression that the interactive scheduler affected both timeslice length AND ordering. It looks like you're saying O(1)int doesn't affect ordering.

More common arrangements avoid this "tying of the knot" with the queue
swapping, and because of this, are capable of expressing long-term
preferences directly in terms of priorities, as opposed to having to
use timeslices to do so.

What do you mean by "tying the knot"?

"More common" as in, things other than the way Linux does it?

Are you saying that as a result of the architecture of the O(1) scheduler, O(1)int has to fiddle with certain things (perhaps less optimal things?) than other architectures do, perhaps because they are not O(1)?

My understanding is that O(1)int adjusts dynamic priority based on task behavior. It looks like you're saying that it doesn't.

Instead, timeslices are used to express the
"scale" on which scheduling events should happen, and as tasks become
more cpu-bound, they have longer timeslices, so that two cpu-bound
tasks of identical priority will RR very slowly and have reduced
context switch overhead, but are near infinitely preemptible by more
interactive or short-running tasks.

It makes sense to me to affect both ordering and timeslice length. What heuristics affect each of those parameters?

Can you preempt a task in the middle, or do you have to wait until it's expired?


William Lee Irwin III wrote:

Obviously this threatens to degenerate to something RR-like that fails
to give the desired bonuses to short-running tasks. The way priorities
are enforced is dynamic timeslice assignment, where longer tasks
receive shorter timeslices and shorter tasks receive longer timeslices,


On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:

How do you vary timeslice? Can you tell the interrupt hardware to "interrupt me after X clock ticks"? Or do you service the interrupt and just do very little in the ISR if you're not going to switch contexts? Isn't that a bit of overhead?


task_timeslice() does a computation based on the dynamic priority to
assign timeslices. This computation is effectively linear.

I was asking more of a hardware question. If your timeslice granularity is 100Hz, does that mean you get an interrupt from a timer at 100Hz? What about tasks which have their timeslice adjusted to be 30ms instead of the minimum 10ms? One of these must be the case:

1) The interrupt occurs at 100Hz, but for two of them, the ISR immediately returns to the task. ISR runtime is minimized to minimize overhead of the ignored interrupts.

2) The hardware can be told to only interrupt after N number of timer ticks.



William Lee Irwin III wrote:

The active queue is scanned, but the queue data structure is organized
so that it's composed of a separate list for each priority, and a
bitmap is maintained alongside the array of lists for quicker searches
for the highest nonempty priority (numerically low), and so scanning is
an O(1) algorithm and touches very few cachelines (if more than one).


On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:

Cool. So, it's not so much a "queue" as it is a sort of an array of lists, eh?


The data structure itself is a priority queue; the implementation (or
representation) of it used in the scheduler is an array of lists with
a bitmap on the side. Other representations such as relaxed heaps or
complete binary tree (CBT) heaps are possible, though probably not
particularly useful for such small priority spaces, which are faster
to exhaustively search with ffz() and the like than to use such
structures to search.

Well, as I interpret what I read, it's more of a priority queue of priority queues, that is if you are able to affect the ordering _within_ a given priority level.



The 2.4.x scheduler had static priorities (nice numbers) and
recomputed what amounted to dynamic priorities (the "goodness" rating)
on the fly each time a task was examined. 2.4.x also recomputed
priorities via the infamous for_each_task() loop
for_each_task(p)
p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)


On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:

What does this do, exactly?


It loops through every task on the system assigning timeslices to every
task according to its nice number.

So, a priority is measured in ticks, and a task has a accumulation of unused ticks, and that's divided by two and added to the nice boost and that's how many it gets next?

I'm just guessing from that code. :)



William Lee Irwin III wrote:

in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
merely examines the priority when the task exhausts its timeslice to
recompute it (and so the expired queue is a useful part of the
mechanics of the algorithm: it's a placeholder for tasks whose
timeslices have been recomputed but don't actually deserve to be run).


On Wed, Aug 13, 2003 at 11:46:41AM -0400, Timothy Miller wrote:

Is this recomputation of timeslice based on the heuristics in the interactive scheduler, or does vanilla also do something like this? What does it do differently?


The timeslices in 2.6.x are controlled indirectly by the dynamic
priority. But the dynamic priorities in 2.6.x are assigned according to
interactivity heuristics, in both vanilla and Kolivas' patches.

Indirectly... but what controls them directly?



Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/