Re: power-efficient scheduling design

From: Arjan van de Ven
Date: Wed Jun 19 2013 - 13:08:45 EST

On 6/19/2013 10:00 AM, Morten Rasmussen wrote:
On Wed, Jun 19, 2013 at 04:39:39PM +0100, Arjan van de Ven wrote:
On 6/18/2013 10:47 AM, David Lang wrote:

It's bad enough trying to guess the needs of the processes, but if you also are reduced to guessing the capabilities of the cores, how can anything be made to work?

btw one way to look at this is to assume that (with some minimal hinting)
the CPU driver will do the right thing and get you just about the best performance you can get
(that is appropriate for the task at hand)...
... and don't do anything in the scheduler proactively.

If I understand correctly, you mean if your hardware/firmware is fully

hardware, firmware and the driver

in control of the p-state selection and changes it fast enough to match
the current load, the scheduler doesn't have to care? By fast enough I
mean, faster than the scheduler would notice if a cpu was temporarily
overloaded at a low p-state. In that case, you wouldn't need
cpufreq/p-state hints, and the scheduler would only move tasks between
cpus when cpus are fully loaded at their max p-state.

with the migration hint, I'm pretty sure we'll be there today typically.
we'll notice within 10 msec regardless, but the migration hint will take
the edge of those 10 msec normally.

I would argue that the "at their max p-state" in your sentence needs to go away.
since you don't know what you actually are except in hindsight.
And even then you don't know if you could have gone higher or not.

the hints I have in mind are not all that complex; we have the biggest issues today
around task migration (the task migrates to a cold cpu... so a simple notifier chain
on the new cpu as it is accepting a task and we can bump it up), real time tasks
(again, simple notifier chain to get you to a predictably high performance level)
and we're a long way better than we are today in terms of actual problems.

For all the talk of ondemand (as ARM still uses that today)... that guy puts you in
either the lowest or highest frequency over 95% of the time. Other non-cpufreq solutions
like on Intel are bit more advanced (and will grow more so over time), but even there,
in the grand scheme of things, the scheduler shouldn't have to care anymore with those
two notifiers in place.

You would need more than a few hints to implement more advanced capacity
management like proposed for the power scheduler. I believe that Intel
would benefit as well from guiding the scheduler to idle the right cpu
to enable deeper idle states and/or enable turbo-boost for other cpus.

that's an interesting theory.
I've yet to see any way to actually have that do something useful.

yes there is some value in grouping a lot of very short tasks together.
not a lot of value, but at least some.

and there is some value in the grouping within a package (to a degree) thing.

(both are basically "statistically, sort left" as policy)

more finegrained than that (esp tied to P states).. not so much.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at