Re: [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
From: Peter Zijlstra
Date: Mon Aug 20 2018 - 06:15:57 EST
On Sun, Aug 12, 2018 at 10:55:15PM +0800, leo.yan@xxxxxxxxxx wrote:
> The first one issue is caused by timer cancel, I wrote one case for
> CPU_0 starting a hrtimer with pinned mode with short expire time and
> when the CPU_0 goes to sleep this short timeout timer can let idle
> governor selects a shallow state; at the meantime another CPU_1 will
> be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
> see the CPU_0 staying in shallow state for long time; it has low
> percentage to cancel the timer successfully, but I do see seldomly the
> timer can be canceled successfully so CPU_0 will stay in idle for long
> time (I cannot explain why the timer cannot be canceled successfully
> for every time, this might be another issue?). This case is tricky,
> but it's possible happen in drivers with timer cancel.
So this is really difficuly to make hapen I think; you first need the
CPU to go deep idle, such that it disabled the tick. Then you have to
start the hrtimer there (using an IPI I suppose) which will then force
the governor to pick a shallow idle state, and then you have to cancel
the timer before it gets triggered.
And then, if the CPU stays perfectly idle, it will be stuck in that
shallow state... forever more.
_However_ IIRC when we (remotely) cancel an hrtimer, we do not in fact
reprogram the timer hardware. So the timer _will_ trigger.
hrtimer_interrupt() will observe nothing to do and reprogram the
hardware for the next timer (if there is one).
This should be enough to cycle through the idle loop and re-select an
idle state and avoid this whole problem.
If that is not happening, then something is busted and we need to figure
out what.