On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote:
On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote:
Hi Chuansheng,
On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@xxxxxxxxx> wrote:
We found sometimes even after we let PM_QOS back to DEFAULT,
the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state
selection immediately after received the IPI interrupt.
The code model is simply like below:
{
pm_qos_update_request(&pm_qos, C1 - 1);
< == Here keep all cores at C0
...;
pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE);
< == Here some cores still stuck at C0 for 2-3s
}
The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to
wake up the core, but when core is in poll idle state, the IPI interrupt
can not break the polling loop.
So seeing how you're from @intel.com I'm assuming you're using x86 here.
I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
just fine, which means we'll fall out of the cpuidle_enter(), which
means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
It will indeed not leave the cpu_idle_loop() function and go right back
into cpuidle_idle_call(), but that will then call cpuidle_select() which
should pick a new C state.
So the interrupt _should_ work. If it doesn't you need to explain why.
I think the issue is related to the poll_idle state, in
drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
cpuidle table as the state 0 (POLL). There is no mwait for this state. It is
a bit confusing because this state is not listed in the acpi / intel idle
driver but inserted implicitly at the beginning of the idle table by the
cpuidle framework when the driver is registered.
static int poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
local_irq_enable();
if (!current_set_polling_and_test()) {
while (!need_resched())
cpu_relax();
}
current_clr_polling();
return index;
}
Ah, well, in that case there's a ton more broken than just this.
kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty
much expects to be called after each interrupt.
Then again, not reflecting properly isn't really a problem, its not like
not accounting interrupts is going to safe power much.