Re: [RFC][PATCH 0/9] sched: Power scheduler design proposal

From: Arjan van de Ven
Date: Tue Jul 09 2013 - 12:59:00 EST


On 7/9/2013 8:55 AM, Morten Rasmussen wrote:
Hi,

This patch set is an initial prototype aiming at the overall power-aware
scheduler design proposal that I previously described
<http://permalink.gmane.org/gmane.linux.kernel/1508480>.

The patch set introduces a cpu capacity managing 'power scheduler' which lives
by the side of the existing (process) scheduler. Its role is to monitor the
system load and decide which cpus that should be available to the process
scheduler. Long term the power scheduler is intended to replace the currently
distributed uncoordinated power management policies and will interface a
unified platform specific power driver obtain power topology information and
handle idle and P-states. The power driver interface should be made flexible
enough to support multiple platforms including Intel and ARM.

I quickly browsed through it but have a hard time seeing what the
real interface is between the scheduler and the hardware driver.
What information does the scheduler give the hardware driver exactly?
e.g. what does it mean?

If the interface is "go faster please" or "we need you to be at fastest now",
that doesn't sound too bad.
But if the interface is "you should be at THIS number" that is pretty bad and
not going to work for us.

also, it almost looks like there is a fundamental assumption in the code
that you can get the current effective P state to make scheduler decisions on;
on Intel at least that is basically impossible... and getting more so with every generation
(likewise for AMD afaics)

(you can get what you ran at on average over some time in the past, but not
what you're at now or going forward)

I'm rather nervous about calculating how many cores you want active as a core scheduler feature.
I understand that for your big.LITTLE architecture you need this due to the asymmetry,
but as a general rule for more symmetric systems it's known to be suboptimal by quite a
real percentage. For a normal Intel single CPU system it's sort of the worst case you can do
in that it leads to serializing tasks that could have run in parallel over multiple cores/threads.
So at minimum this kind of logic must be enabled/disabled based on architecture decisions.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/