Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full

From: Paul E. McKenney
Date: Wed Mar 20 2024 - 18:55:26 EST


On Wed, Mar 20, 2024 at 05:15:48PM +0100, Frederic Weisbecker wrote:
> Le Wed, Mar 20, 2024 at 04:14:24AM -0700, Paul E. McKenney a écrit :
> > On Tue, Mar 19, 2024 at 02:18:00AM -0700, Paul E. McKenney wrote:
> > > On Tue, Mar 19, 2024 at 12:07:29AM +0100, Frederic Weisbecker wrote:
> > > > While running in nohz_full mode, a task may enqueue a timer while the
> > > > tick is stopped. However the only places where the timer wheel,
> > > > alongside the timer migration machinery's decision, may reprogram the
> > > > next event accordingly with that new timer's expiry are the idle loop or
> > > > any IRQ tail.
> > > >
> > > > However neither the idle task nor an interrupt may run on the CPU if it
> > > > resumes busy work in userspace for a long while in full dynticks mode.
> > > >
> > > > To solve this, the timer enqueue path raises a self-IPI that will
> > > > re-evaluate the timer wheel on its IRQ tail. This asynchronous solution
> > > > avoids potential locking inversion.
> > > >
> > > > This is supposed to happen both for local and global timers but commit:
> > > >
> > > > b2cf7507e186 ("timers: Always queue timers on the local CPU")
> > > >
> > > > broke the global timers case with removing the ->is_idle field handling
> > > > for the global base. As a result, global timers enqueue may go unnoticed
> > > > in nohz_full.
> > > >
> > > > Fix this with restoring the idle tracking of the global timer's base,
> > > > allowing self-IPIs again on enqueue time.
> > >
> > > Testing with the previous patch (1/2 in this series) reduced the number of
> > > problems by about an order of magnitude, down to two sched_tick_remote()
> > > instances and one enqueue_hrtimer() instance, very good!
> > >
> > > I have kicked off a test including this patch. Here is hoping! ;-)
> >
> > And 22*100 hours of TREE07 got me one run with a sched_tick_remote()

Sigh. s/22/12/

> > complaint and another run with a starved RCU grace-period kthread.
> > So this is definitely getting more reliable, but still a little ways
> > to go.

An additional eight hours got anohtre sched_tick_remote().

> Right, there is clearly something else. Investigation continues...

Please let me know if there is a debug patch that I could apply.

Thanx, Paul