On Mon, Nov 10, 2014 at 04:58:29PM +0100, Daniel Lezcano wrote:
I don't get it, I've clearly not stared at it long enough, but why is
that STATE_START crap needed anywhere?
Excellent question :)
On x86, the config option ARCH_HAS_CPU_RELAX is set (x86 is the only one).
That leads to the macro CPUIDLE_DRIVER_STATE_START equal 1.
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/include/linux/cpuidle.h#n221
Then the acpi cpuidle driver and the intel_driver begin to fill the idle
state at index == CPUIDLE_DRIVER_STATE_START, so leaving the 0th idle state
empty.
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/idle/intel_idle.c#n848
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/acpi/processor_idle.c#n953
Then when the driver is registered and if ARCH_HAS_CPU_RELAX is set, the
cpuidle framework insert the 0th with the poll state (reminder : only for
x86).
Appears to me part of the problem is right there, the intel_idle and
proessor_idle should register the poll state themselves. That
immediately makes this weirdness go away.
Registering states from two places is not something that's sane or
desired I think.
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
If you look at the ladder governor (which I believe nobody is still using
it), or at the menu governor, all the indexes begin at
CPUIDLE_DRIVER_STATE_START, so all the code is preventing to use the 0th
state ... :)
So why is needed the poll state ?
1. When the latency_req is 0 (it returns 0, so the poll state)
Right, that makes sense.
2. When the select's menu governor fails to find a state *and* if the next
timer is before 5us
That seems rather arbitrary.
Why would it fail to find a state?
https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/tree/drivers/cpuidle/driver.c#n195
And when we investigate the same code but on the other archs, the
CPUIDLE_DRIVER_STATE_START dance makes things slightly different.
So the conclusion is, we are inserting a state in the idle state array but
we do everything to prevent to use it :)
For me it appears logical to just kill this state from the x86 idle drivers
and move it in the idle_mainloop in case an idle state selection fails.
But why, ppc has a latency_req==0 state too, right?
I agree that we should shoot STATE_START in the head, but I feel we
should do it by fixing the state registration.
I really don't get why the governors should know about this though, its
just another state, they should iterate all states and pick the best,
given the power usage this state should really never be eligible unless
we're QoS forced or whatnot.