Re: NOHZ tick-stop error: Non-RCU local softirq work is pending
From: Frederic Weisbecker
Date: Thu Dec 10 2020 - 19:17:43 EST
On Thu, Dec 10, 2020 at 01:17:56PM -0800, Paul E. McKenney wrote:
> And please see attached. Lots of output, in fact, enough that it
> was still dumping when the second instance happened.
Thanks!
So the issue is that ksoftirqd is parked on CPU down with vectors
still pending. Either:
1) Ksoftirqd has exited because it has too many to process and it has
exceeded the time limit, but then it parks, leaving the rest unhandled.
2) Ksoftirqd has completed its work but something has raised a softirq
after it got parked.
Can you run the following (on top of the previous patch and boot options)
so that we see if (and what) it still triggers (in which case we should be in 2) ).
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 09229ad82209..7d558cb7a037 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -650,7 +650,9 @@ static void run_ksoftirqd(unsigned int cpu)
* We can safely run softirq on inline stack, as we are not deep
* in the task stack here.
*/
- __do_softirq();
+ do {
+ __do_softirq();
+ } while (kthread_should_park() && local_softirq_pending());
local_irq_enable();
cond_resched();
return;
Thanks!