Re: [PATCHv2 0/5] coupled cpuidle state support
From: Colin Cross
Date: Thu Mar 15 2012 - 19:37:22 EST
On Wed, Mar 14, 2012 at 11:29 AM, Colin Cross <ccross@xxxxxxxxxxx> wrote:
> On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
> cpus cannot be independently powered down, either due to
> sequencing restrictions (on Tegra 2, cpu 0 must be the last to
> power down), or due to HW bugs (on OMAP4460, a cpu powering up
> will corrupt the gic state unless the other cpu runs a work
> around). Each cpu has a power state that it can enter without
> coordinating with the other cpu (usually Wait For Interrupt, or
> WFI), and one or more "coupled" power states that affect blocks
> shared between the cpus (L2 cache, interrupt controller, and
> sometimes the whole SoC). Entering a coupled power state must
> be tightly controlled on both cpus.
>
> The easiest solution to implementing coupled cpu power states is
> to hotplug all but one cpu whenever possible, usually using a
> cpufreq governor that looks at cpu load to determine when to
> enable the secondary cpus. This causes problems, as hotplug is an
> expensive operation, so the number of hotplug transitions must be
> minimized, leading to very slow response to loads, often on the
> order of seconds.
>
> This patch series implements an alternative solution, where each
> cpu will wait in the WFI state until all cpus are ready to enter
> a coupled state, at which point the coupled state function will
> be called on all cpus at approximately the same time.
>
> Once all cpus are ready to enter idle, they are woken by an smp
> cross call. At this point, there is a chance that one of the
> cpus will find work to do, and choose not to enter suspend. A
> final pass is needed to guarantee that all cpus will call the
> power state enter function at the same time. During this pass,
> each cpu will increment the ready counter, and continue once the
> ready counter matches the number of online coupled cpus. If any
> cpu exits idle, the other cpus will decrement their counter and
> retry.
>
> To use coupled cpuidle states, a cpuidle driver must:
>
> Set struct cpuidle_device.coupled_cpus to the mask of all
> coupled cpus, usually the same as cpu_possible_mask if all cpus
> are part of the same cluster. The coupled_cpus mask must be
> set in the struct cpuidle_device for each cpu.
>
> Set struct cpuidle_device.safe_state to a state that is not a
> coupled state. This is usually WFI.
>
> Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
> state that affects multiple cpus.
>
> Provide a struct cpuidle_state.enter function for each state
> that affects multiple cpus. This function is guaranteed to be
> called on all cpus at approximately the same time. The driver
> should ensure that the cpus all abort together if any cpu tries
> to abort once the function is called.
>
> This series has been tested by implementing a test cpuidle state
> that uses the parallel barrier helper function to verify that
> all cpus call the function at the same time.
>
> This patch set has a few disadvantages over the hotplug governor,
> but I think they are all fairly minor:
> * Worst-case interrupt latency can be increased. If one cpu
> receives an interrupt while the other is spinning in the
> ready_count loop, the second cpu will be stuck with
> interrupts off until the first cpu finished processing
> its interrupt and exits idle. This will increase the worst
> case interrupt latency by the worst-case interrupt processing
> time, but should be very rare.
> * Interrupts are processed while still inside pm_idle.
> Normally, interrupts are only processed at the very end of
> pm_idle, just before it returns to the idle loop. Coupled
> states requires processing interrupts inside
> cpuidle_enter_state_coupled in order to distinguish between
> the smp_cross_call from another cpu that is now idle and an
> interrupt that should cause idle to exit.
> I don't see a way to fix this without either being able to
> read the next pending irq from the interrupt chip, or
> querying the irq core for which interrupts were processed.
> * Since interrupts are processed inside cpuidle, the next
> timer event could change. The new timer event will be
> handled correctly, but the idle state decision made by
> the governor will be out of date, and will not be revisited.
> The governor select function could be called again every time,
> but this could lead to a lot of work being done by an idle
> cpu if the other cpu was mostly busy.
>
> v2:
> * removed the coupled lock, replacing it with atomic counters
> * added a check for outstanding pokes before beginning the
> final transition to avoid extra wakeups
> * made the cpuidle_coupled struct completely private
> * fixed kerneldoc comment formatting
> * added a patch with a helper function for resynchronizing
> cpus after aborting idle
> * added a patch (not for merging) to add trace events for
> verification and performance testing
I forgot to mention, this patch series is on v3.3-rc7, and will
conflict with the cpuidle timekeeping patches. If those go in first
(which is likely), I will rework this series on top of it. I left it
on v3.3-rc7 now to make testing easier.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/