Re: [patch 2/2] sched/idle: Make default_idle_call() NOHZ aware
From: Christian Loehle
Date: Mon Mar 02 2026 - 06:40:11 EST
On 3/2/26 11:11, Frederic Weisbecker wrote:
> On Mon, Mar 02, 2026 at 11:03:00AM +0000, Christian Loehle wrote:
>> On 3/2/26 10:43, Frederic Weisbecker wrote:
>>> On Sun, Mar 01, 2026 at 08:30:51PM +0100, Thomas Gleixner wrote:
>>>> Guests fall back to default_idle_call() as there is no cpuidle driver
>>>> available to them by default. That causes a problem in fully loaded
>>>> scenarios where CPUs go briefly idle for a couple of microseconds:
>>>>
>>>> tick_nohz_idle_stop_tick() is invoked unconditionally which means unless
>>>> there is timer pending in the next tick, the tick is stopped and a couple
>>>> of microseconds later when the idle condition goes away restarted. That
>>>> requires to program the clockevent device twice which implies a VM exit for
>>>> each reprogramming.
>>>>
>>>> It was suggested to remove the tick_nohz_idle_stop_tick() invocation from
>>>> the default idle code, but would be counterproductive. It would not allow
>>>> the host to go into deeper idle states when the guest CPU is fully idle as
>>>> it has to maintain the periodic tick.
>>>>
>>>> Cure this by implementing a trivial moving average filter which keeps track
>>>> of the recent idle recidency time and only stop the tick when the average
>>>> is larger than a tick.
>>>>
>>>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxx>
>>>
>>> Shouldn't there be instead a new dedicated cpuidle driver with proper governor support?
>>
>> I think a dummy cpuidle driver is an option, but calling into any governor
>> seems overkill IMO, it presents an option to the user where there really is
>> none (after all the cpuidle governor would just make a boolean decision as
>> there are no states).
>
> I must confess I don't fully understand the picture with the non-existent states
> but what Thomas is doing in his patch is basically an ad-hoc implementation of
> cpuidle governor decision whether or not to stop the tick.
>
Yup and if we put that into the cpuidle governor then we have to duplicate
that logic for all governors even though for <= 1 states they hopefully
should be the same.
A dummy driver would allow for this logic to live in drivers/cpuidle/ but
I don't have a preference either way.