Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

From: Andy Lutomirski
Date: Thu Aug 14 2014 - 17:12:35 EST

On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
> On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
>> So seeing how you're from I'm assuming you're using x86 here.
>> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
>> just fine, which means we'll fall out of the cpuidle_enter(), which
>> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
>> It will indeed not leave the cpu_idle_loop() function and go right back
>> into cpuidle_idle_call(), but that will then call cpuidle_select() which
>> should pick a new C state.
>> So the interrupt _should_ work. If it doesn't you need to explain why.
> I think the issue is related to the poll_idle state, in
> drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
> cpuidle table as the state 0 (POLL). There is no mwait for this state.
> It is a bit confusing because this state is not listed in the acpi /
> intel idle driver but inserted implicitly at the beginning of the idle
> table by the cpuidle framework when the driver is registered.
> static int poll_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> local_irq_enable();
> if (!current_set_polling_and_test()) {
> while (!need_resched())
> cpu_relax();
> }
> current_clr_polling();
> return index;
> }

As the most recent person to have modified this function, and as an
avowed hater of pointless IPIs, let me ask a rather different question:
why are you sending IPIs at all? As of Linux 3.16, poll_idle actually
supports the polling idle interface :)

Can't you just do:

if (set_nr_if_polling(rq->idle)) {
} else {
spin_lock_irqsave(&rq->lock, flags);
if (rq->curr == rq->idle)
// else the CPU wasn't idle; nothing to do
raw_spin_unlock_irqrestore(&rq->lock, flags);

In the common case (wake from C0, i.e. polling idle), this will skip the
IPI entirely unless you race with idle entry/exit, saving a few more
precious electrons and all of the latency involved in poking the APIC


P.S. "30mV" in the patch description is presumably a typo.
