Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

From: Paul E. McKenney
Date: Fri Oct 04 2024 - 09:27:03 EST


On Thu, Oct 03, 2024 at 08:50:37PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 03, 2024 at 09:04:30AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 03, 2024 at 04:22:40PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 03, 2024 at 05:45:47AM -0700, Paul E. McKenney wrote:
> > >
> > > > I ran 100*TREE03 for 18 hours each, and got 23 instances of *something*
> > > > happening (and I need to suppress stalls on the repeat). One of the
> > > > earlier bugs happened early, but sadly not this one.
> > >
> > > Damn, I don't have the amount of CPU hours available you mention in your
> > > later email. I'll just go up the rounds to 20 minutes and see if
> > > something wants to go bang before I have to shut down the noise
> > > pollution for the day...
> >
> > Indeed, this was one reason I was soliciting debug patches. ;-)
>
> Sooo... I was contemplating if something like the below might perhaps
> help some. It's a bit of a mess (I'll try and clean up if/when it
> actually proves to work), but it compiles and survives a hand full of 1m
> runs.

And here is the ftrace dump from one of the failures in the past
18-hour run. Idiot here re-enabled RCU CPU stall warnings after doing
ftrace_dump(), forgetting the asynchronous nature of new-age printk(),
so I don't have the CPU number that the failure happened on.

Of to test your new patch...

Thanx, Paul