Re: [patch 2/2] sched/idle: Make default_idle_call() NOHZ aware
From: Qais Yousef
Date: Tue Mar 03 2026 - 22:37:02 EST
On 03/02/26 11:39, Christian Loehle wrote:
> On 3/2/26 11:11, Frederic Weisbecker wrote:
> > On Mon, Mar 02, 2026 at 11:03:00AM +0000, Christian Loehle wrote:
> >> On 3/2/26 10:43, Frederic Weisbecker wrote:
> >>> On Sun, Mar 01, 2026 at 08:30:51PM +0100, Thomas Gleixner wrote:
> >>>> Guests fall back to default_idle_call() as there is no cpuidle driver
> >>>> available to them by default. That causes a problem in fully loaded
> >>>> scenarios where CPUs go briefly idle for a couple of microseconds:
> >>>>
> >>>> tick_nohz_idle_stop_tick() is invoked unconditionally which means unless
> >>>> there is timer pending in the next tick, the tick is stopped and a couple
> >>>> of microseconds later when the idle condition goes away restarted. That
> >>>> requires to program the clockevent device twice which implies a VM exit for
> >>>> each reprogramming.
> >>>>
> >>>> It was suggested to remove the tick_nohz_idle_stop_tick() invocation from
> >>>> the default idle code, but would be counterproductive. It would not allow
> >>>> the host to go into deeper idle states when the guest CPU is fully idle as
> >>>> it has to maintain the periodic tick.
> >>>>
> >>>> Cure this by implementing a trivial moving average filter which keeps track
> >>>> of the recent idle recidency time and only stop the tick when the average
> >>>> is larger than a tick.
> >>>>
> >>>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxx>
> >>>
> >>> Shouldn't there be instead a new dedicated cpuidle driver with proper governor support?
> >>
> >> I think a dummy cpuidle driver is an option, but calling into any governor
> >> seems overkill IMO, it presents an option to the user where there really is
> >> none (after all the cpuidle governor would just make a boolean decision as
> >> there are no states).
> >
> > I must confess I don't fully understand the picture with the non-existent states
> > but what Thomas is doing in his patch is basically an ad-hoc implementation of
> > cpuidle governor decision whether or not to stop the tick.
> >
>
> Yup and if we put that into the cpuidle governor then we have to duplicate
> that logic for all governors even though for <= 1 states they hopefully
> should be the same.
>
> A dummy driver would allow for this logic to live in drivers/cpuidle/ but
> I don't have a preference either way.
I am not sure about all the details, but vm exit seems akin to a deep idle
state with sizeable latency hit. Not sure how the power impact can be modeled
though.. It seems purely associated with stopping the tick, so maybe can be the
same as allowing the physical CPU to enter deep idle state since not stopping
the tick means the host cpu can't enter it either? ie: copy min residency from
first deep idle state of the host.
Haven't thought this through to be honest, but seems there's room for some
sensible model. Whether worth it or not, I don't know either :)