Re: [BUG] RCU CPU stall warning (bisected)

From: Paul E. McKenney
Date: Mon Dec 04 2023 - 12:41:07 EST


On Wed, Nov 22, 2023 at 10:43:52AM -0800, Paul E. McKenney wrote:
> Hello!
>
> Just FYI for the moment.
>
> I hit the following three times out of 15 ten-hour TREE03 rcutorture
> runs on next-20231121, which suggests an MTBF of about 50 hours. This is
> new over the past week or so.
>
> The symptom is that the RCU grace-period kthread is marked as running
> ("R"), but remains stuck in schedule() for the remainder of the run.
>
> My next steps will be to retry on today's -next, and if that reproduces
> the bug, attempt to bisect.
>
> But I figured that I should send this out on the off-chance that it is
> a known problem.

And the bisection fingered this commit, maybe even rightfully so:

5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")

Next step is to revert this on top of v6.7-rc3 and retest. I will let
you know what happened when the test completes. And the error rate did
drop off during the bisection, so it is possible that there are multiple
causes of this bug. :-/

In the meantime, is this a known problem? ("Did I just waste a week
bisecting something that has already been fixed?")

Thanx, Paul