Re: [PATCH] kfence: Avoid stalling work queue task without allocations
From: Paul E. McKenney
Date: Wed Nov 11 2020 - 20:37:01 EST
On Wed, Nov 11, 2020 at 09:21:53PM +0100, Marco Elver wrote:
> On Wed, Nov 11, 2020 at 11:21AM -0800, Paul E. McKenney wrote:
> [...]
> > > > rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > >
> > > Sadly, no, next-20201110 already included that one, and that's what I
> > > tested and got me all those warnings above.
> >
> > Hey, I had to ask! The only uncertainty I seee is the acquisition of
> > the lock in rcu_iw_handler(), for which I add a lockdep check in the
> > (untested) patch below. The other thing I could do is sprinkle such
> > checks through the stall-warning code on the assumption that something
> > RCU is calling is enabling interrupts.
> >
> > Other thoughts?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 70d48c5..3d67650 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
> >
> > rdp = container_of(iwp, struct rcu_data, rcu_iw);
> > rnp = rdp->mynode;
> > + lockdep_assert_irqs_disabled();
> > raw_spin_lock_rcu_node(rnp);
> > if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
> > rdp->rcu_iw_gp_seq = rnp->gp_seq;
>
> This assert didn't fire yet, I just get more of the below. I'll keep
> rerunning, but am not too hopeful...
Is bisection a possibility?
Failing that, please see the updated patch below. This adds a few more
calls to lockdep_assert_irqs_disabled(), but perhaps more helpfully dumps
the current stack of the CPU that the RCU grace-period kthread wants to
run on in the case where this kthread has been starved of CPU.
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 70d48c5..d203ea0 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -189,6 +189,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
rdp = container_of(iwp, struct rcu_data, rcu_iw);
rnp = rdp->mynode;
+ lockdep_assert_irqs_disabled();
raw_spin_lock_rcu_node(rnp);
if (!WARN_ON_ONCE(!rdp->rcu_iw_pending)) {
rdp->rcu_iw_gp_seq = rnp->gp_seq;
@@ -449,21 +450,32 @@ static void print_cpu_stall_info(int cpu)
/* Complain about starvation of grace-period kthread. */
static void rcu_check_gp_kthread_starvation(void)
{
+ int cpu;
struct task_struct *gpk = rcu_state.gp_kthread;
unsigned long j;
if (rcu_is_gp_kthread_starving(&j)) {
+ cpu = gpk ? task_cpu(gpk) : -1;
pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n",
rcu_state.name, j,
(long)rcu_seq_current(&rcu_state.gp_seq),
data_race(rcu_state.gp_flags),
gp_state_getname(rcu_state.gp_state), rcu_state.gp_state,
- gpk ? gpk->state : ~0, gpk ? task_cpu(gpk) : -1);
+ gpk ? gpk->state : ~0, cpu);
if (gpk) {
pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
pr_err("RCU grace-period kthread stack dump:\n");
+ lockdep_assert_irqs_disabled();
sched_show_task(gpk);
+ lockdep_assert_irqs_disabled();
+ if (cpu >= 0) {
+ pr_err("Stack dump where RCU grace-period kthread last ran:\n");
+ if (!trigger_single_cpu_backtrace(cpu))
+ dump_cpu_task(cpu);
+ }
+ lockdep_assert_irqs_disabled();
wake_up_process(gpk);
+ lockdep_assert_irqs_disabled();
}
}
}