[PATCHv3 0/4] coupled cpuidle state support

From: Colin Cross
Date: Mon May 07 2012 - 20:58:15 EST

Next message: Colin Cross: "[PATCHv4 2/4] cpuidle: fix error handling in __cpuidle_register_device"
Previous message: Colin Cross: "[PATCHv4 4/4] cpuidle: coupled: add parallel barrier function"
Next in thread: Colin Cross: "[PATCHv4 1/4] cpuidle: refactor out cpuidle_enter_state"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
cpus cannot be independently powered down, either due to
sequencing restrictions (on Tegra 2, cpu 0 must be the last to
power down), or due to HW bugs (on OMAP4460, a cpu powering up
will corrupt the gic state unless the other cpu runs a work
around). Each cpu has a power state that it can enter without
coordinating with the other cpu (usually Wait For Interrupt, or
WFI), and one or more "coupled" power states that affect blocks
shared between the cpus (L2 cache, interrupt controller, and
sometimes the whole SoC). Entering a coupled power state must
be tightly controlled on both cpus.

The easiest solution to implementing coupled cpu power states is
to hotplug all but one cpu whenever possible, usually using a
cpufreq governor that looks at cpu load to determine when to
enable the secondary cpus. This causes problems, as hotplug is an
expensive operation, so the number of hotplug transitions must be
minimized, leading to very slow response to loads, often on the
order of seconds.

This patch series implements an alternative solution, where each
cpu will wait in the WFI state until all cpus are ready to enter
a coupled state, at which point the coupled state function will
be called on all cpus at approximately the same time.

Once all cpus are ready to enter idle, they are woken by an smp
cross call. At this point, there is a chance that one of the
cpus will find work to do, and choose not to enter suspend. A
final pass is needed to guarantee that all cpus will call the
power state enter function at the same time. During this pass,
each cpu will increment the ready counter, and continue once the
ready counter matches the number of online coupled cpus. If any
cpu exits idle, the other cpus will decrement their counter and
retry.

To use coupled cpuidle states, a cpuidle driver must:

Set struct cpuidle_device.coupled_cpus to the mask of all
coupled cpus, usually the same as cpu_possible_mask if all cpus
are part of the same cluster. The coupled_cpus mask must be
set in the struct cpuidle_device for each cpu.

Set struct cpuidle_device.safe_state to a state that is not a
coupled state. This is usually WFI.

Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
state that affects multiple cpus.

Provide a struct cpuidle_state.enter function for each state
that affects multiple cpus. This function is guaranteed to be
called on all cpus at approximately the same time. The driver
should ensure that the cpus all abort together if any cpu tries
to abort once the function is called.

This series has been tested by implementing a test cpuidle state
that uses the parallel barrier helper function to verify that
all cpus call the function at the same time.

This patch set has a few disadvantages over the hotplug governor,
but I think they are all fairly minor:
* Worst-case interrupt latency can be increased. If one cpu
receives an interrupt while the other is spinning in the
ready_count loop, the second cpu will be stuck with
interrupts off until the first cpu finished processing
its interrupt and exits idle. This will increase the worst
case interrupt latency by the worst-case interrupt processing
time, but should be very rare.
* Interrupts are processed while still inside pm_idle.
Normally, interrupts are only processed at the very end of
pm_idle, just before it returns to the idle loop. Coupled
states requires processing interrupts inside
cpuidle_enter_state_coupled in order to distinguish between
the smp_cross_call from another cpu that is now idle and an
interrupt that should cause idle to exit.
I don't see a way to fix this without either being able to
read the next pending irq from the interrupt chip, or
querying the irq core for which interrupts were processed.
* Since interrupts are processed inside cpuidle, the next
timer event could change. The new timer event will be
handled correctly, but the idle state decision made by
the governor will be out of date, and will not be revisited.
The governor select function could be called again every time,
but this could lead to a lot of work being done by an idle
cpu if the other cpu was mostly busy.

v2:
* removed the coupled lock, replacing it with atomic counters
* added a check for outstanding pokes before beginning the
final transition to avoid extra wakeups
* made the cpuidle_coupled struct completely private
* fixed kerneldoc comment formatting
* added a patch with a helper function for resynchronizing
cpus after aborting idle
* added a patch (not for merging) to add trace events for
verification and performance testing

v3:
* rebased on v3.4-rc4 by Santosh
* fixed decrement in cpuidle_coupled_cpu_set_alive
* updated tracing patch to remove unnecessary debugging so
it can be merged
* made tracing _rcuidle

v4:
* removed BUG_ONs
* converted ready and waiting counts to a single atomic (idea
from Rafael)
* prevent coupled idle during hotplug, simplifying alive_count
* dropped trace patch for now, will repost a new one later

This series has been tested and reviewed by Santosh and Kevin
for OMAP4, which has a cpuidle series ready for 3.5, and Tegra
and Exynos5 patches are in progress. I think this is ready to
go in. Lean, are you maintaining a cpuidle tree for linux-next?
If not, I can publish a tree for linux-next, or this could go in
through Arnd's tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Colin Cross: "[PATCHv4 2/4] cpuidle: fix error handling in __cpuidle_register_device"
Previous message: Colin Cross: "[PATCHv4 4/4] cpuidle: coupled: add parallel barrier function"
Next in thread: Colin Cross: "[PATCHv4 1/4] cpuidle: refactor out cpuidle_enter_state"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]