Re: Something is leaking RCU holds from interrupt context

From: Peter Zijlstra
Date: Tue Apr 06 2021 - 04:33:49 EST


On Sun, Apr 04, 2021 at 11:24:57AM +0100, Matthew Wilcox wrote:
> On Sat, Apr 03, 2021 at 09:15:17PM -0700, syzbot wrote:
> > HEAD commit: 2bb25b3a Merge tag 'mips-fixes_5.12_3' of git://git.kernel..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1284cc31d00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > dashboard link: https://syzkaller.appspot.com/bug?extid=dde0cc33951735441301
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+dde0cc33951735441301@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > WARNING: suspicious RCU usage
> > 5.12.0-rc5-syzkaller #0 Not tainted
> > -----------------------------
> > kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 2, debug_locks = 0
> > no locks held by systemd-udevd/4825.
>
> I think we have something that's taking the RCU read lock in
> (soft?) interrupt context and not releasing it properly in all
> situations. This thread doesn't have any locks recorded, but
> lock_is_held(&rcu_bh_lock_map) is true.
>
> Is there some debugging code that could find this? eg should
> lockdep_softirq_end() check that rcu_bh_lock_map is not held?
> (if it's taken in process context, then BHs can't run, so if it's
> held at softirq exit, then there's definitely a problem).

Hmm, I'm sure i've written something like that at least once, but I
can't seem to find it :/

Does something like the completely untested below work for you?

---

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 600c10da321a..d8aa1dc481b6 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -54,6 +54,8 @@ extern void trace_hardirqs_off_finish(void);
extern void trace_hardirqs_on(void);
extern void trace_hardirqs_off(void);

+extern void lockdep_validate_context_empty(void);
+
# define lockdep_hardirq_context() (raw_cpu_read(hardirq_context))
# define lockdep_softirq_context(p) ((p)->softirq_context)
# define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled))
@@ -69,6 +71,7 @@ do { \
} while (0)
# define lockdep_hardirq_exit() \
do { \
+ lockdep_validate_context_empty(); \
__this_cpu_dec(hardirq_context); \
} while (0)
# define lockdep_softirq_enter() \
@@ -77,6 +80,7 @@ do { \
} while (0)
# define lockdep_softirq_exit() \
do { \
+ lockdep_validate_context_empty(); \
current->softirq_context--; \
} while (0)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 82db977eada8..09ac70d1b3a6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2697,6 +2697,37 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
return 0;
}

+void lockdep_validate_context_empty(void)
+{
+ struct task_struct *curr = current;
+ int depth, ctx = task_irq_context(curr);
+
+ if (debug_locks_silent)
+ return;
+
+ depth = curr->lockdep_depth;
+ if (!depth)
+ return;
+
+ if (curr->held_locks[depth-1].irq_context != ctx)
+ return;
+
+
+ pr_warn("\n");
+ pr_warn("====================================\n");
+ pr_warn("WARNING: Asymmetric locking detected\n");
+ print_kernel_ident();
+ pr_warn("------------------------------------\n");
+
+ pr_warn("%s/%d is leaving an IRQ context with extra locks on\n",
+ curr->comm, task_pid_nr(curr));
+
+ lockdep_printk_held_locks(curr);
+
+ printk("\nstack backtrace:\n");
+ dump_stack();
+}
+
#else

static inline int check_irq_usage(struct task_struct *curr,