Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz_full

From: Frederic Weisbecker
Date: Mon Apr 01 2024 - 17:56:45 EST


Le Mon, Apr 01, 2024 at 02:26:25PM -0700, Paul E. McKenney a écrit :
> > > _ The RCU CPU Stall report. I strongly suspect the cause is the hrtimer
> > > enqueue to an offline CPU. Let's solve that and we'll see if it still
> > > triggers.
> >
> > Sounds like a plan!
>
> Just checking in on this one. I did reproduce your RCU CPU stall report
> and also saw a TREE03 OOM that might (or might not) be related. Please
> let me know if hammering TREE03 harder or adding some debug would help.
> Otherwise, I will assume that you are getting sufficient bug reports
> from your own testing to be getting along with.

Hehe, there are a lot indeed :-)

So there has been some discussion on CPUSET VS Hotplug, as a problem there
is likely the cause of the hrtimer warning you saw, which in turn might
be the cause of the RCU stalls.

Do you always see the hrtimer warning along the RCU stalls? Because if so, this
might help:
https://lore.kernel.org/lkml/20240401145858.2656598-1-longman@xxxxxxxxxx/T/#m1bed4d298715d1a6b8289ed48e9353993c63c896

Thanks.

>
> Thanx, Paul