Re: [PATCH] O12.2int for interactivity

From: Timothy Miller
Date: Wed Aug 13 2003 - 10:38:02 EST


Thank you for answering. Your explanation are very enlightening. You should be quoted on KernelTrap.org. :)

I have more questions....


William Lee Irwin III wrote:
On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:

Ok... this reminds me that there is an aspect of all of this that I don't understand. Please pardon my ignorance. And furthermore, if there is some document which answers all of my questions, please direct me to it so I don't waste your time.


No trouble at all.


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:

I understand that the O(1) scheduler uses two queues. One is the active queue, and the other is the expired queue. When a process has exhausted its timeslice, it gets put into the expired queue (at the end, I presume). If not, it gets put into the active queue.
Is this the vanilla scheduler?


The equivalent for the 2.4.x scheduler is the "epoch"; essentially what
2.6.x has implemented is incremental timeslice assignment for epoch
expiry.

What do you mean by "incremental"? And by 2.6.x, are you talking about vanilla, or interactive?

The way that goes is that when a timeslice runs out, it's
recomputed, and the expired queue is used as a placeholder for the
information about what to run and in what order after the epoch expires
(i.e. the active queue empties).

Where are blocked tasks placed? Are they put somewhere else and readded to the active queue when their I/O (or whatever) completes? How is their ordering in the queue determined? That is, does a task which wakes up preempt the running task? Does it get run "next"?

In Operating Systems design class in college, we were told that when an I/O interrupt arrives, the currently active task isn't immediately returned to, because the overhead of the context switch and the fact that we don't know how near to the end the task is to its timeslice makes it not worth it, so we just move on to the next task.

> When the epoch expires/active queue
empties, the two queues exchange places. The net effect is that the
duelling queues end up representing a circular list maintained in some
special order that can have insertions done into the middle of it for
tasks designated as "interactive", and that priority preemption's
effectiveness is unclear.

And by "middle", do you mean at the end of the active queue which is the "middle" if you think of the expired queue as being a continuation of the active queue?

Obviously this threatens to degenerate to something RR-like that fails
to give the desired bonuses to short-running tasks. The way priorities
are enforced is dynamic timeslice assignment, where longer tasks
receive shorter timeslices and shorter tasks receive longer timeslices,

How do you vary timeslice? Can you tell the interrupt hardware to "interrupt me after X clock ticks"? Or do you service the interrupt and just do very little in the ISR if you're not going to switch contexts? Isn't that a bit of overhead?

and they're readjusted at various times, which is a somewhat unusual
design decision (as is the epoch bit). It also deviates from RR where
interactive tasks can re-enter the active queue at timeslice expiry.
So in this way, the favoring of short-running tasks is recovered from
the otherwise atypical design, as the interactive tasks will often be
re-inserted near the head of the queue as their priorities are high
and their timeslices are retained while sleeping.


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:

One thing I don't understand is, for a given queue, how do priorities affect running of processes? Two possibilities come to mind:
1) All pri 10 processes will be run before any pri 11 processes.
2) A pri 10 process will be run SLIGHTLY more often than a pri 11 process.
For the former, is the active queue scanned for runnable processes of the highest priority? If that's the case, why not have one queue for each priority level? Wouldn't that reduce the amount of scanning through the queue?


(1) is the case. Things differ a bit from what you might read about
because of the epoch business.

The active queue is scanned, but the queue data structure is organized
so that it's composed of a separate list for each priority, and a
bitmap is maintained alongside the array of lists for quicker searches
for the highest nonempty priority (numerically low), and so scanning is
an O(1) algorithm and touches very few cachelines (if more than one).

Cool. So, it's not so much a "queue" as it is a sort of an array of lists, eh?


On Tue, Aug 12, 2003 at 11:04:44AM -0400, Timothy Miller wrote:

What it comes down to that I want to know is if priorities affect running of processes linearly or exponentially.
How do nice levels affect priorities? (Vanilla and interactive)
How do priorities affect processes in the expired queue?
In the vanilla scheduler, can a low enough nice value keep an expired process off the expired queue? How is that determined?
Does the vanilla scheduler have priorities? If so, how are they determined?


Neither linearly nor exponentially; nice levels assign static
priorities; dynamic priorities (the ones used for scheduling decisions)
are restricted to the range where |dynamic_prio - static_prio| <= 5.
Nice values do not keep tasks off the expired queue. Tasks on the
expired queue are ordered by priority and not examined until the epoch
expiry.

Doesn't it cause starvation of expired tasks if interactive tasks keep hogging the active queue? Does it matter?

The 2.4.x scheduler had static priorities (nice numbers) and
recomputed what amounted to dynamic priorities (the "goodness" rating)
on the fly each time a task was examined. 2.4.x also recomputed
priorities via the infamous for_each_task() loop
for_each_task(p)
p->count = (p->counter >> 1) + NICE_TO_TICKS(p->nice)

What does this do, exactly?

in the if (unlikely(!c)) case of the repeat_schedule: label where 2.6.x
merely examines the priority when the task exhausts its timeslice to
recompute it (and so the expired queue is a useful part of the
mechanics of the algorithm: it's a placeholder for tasks whose
timeslices have been recomputed but don't actually deserve to be run).

Is this recomputation of timeslice based on the heuristics in the interactive scheduler, or does vanilla also do something like this? What does it do differently?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/