Re: [PATCH] locking/lockdep: skip irq save/restore in hardirq context in lock_release()

From: Waiman Long

Date: Mon Jun 29 2026 - 01:27:40 EST

On 6/29/26 12:11 AM, Deepanshu Kartikey wrote:

lock_release() performs a raw_local_irq_save/restore dance around its
validation work. While safe in process and softirq context, this is
dangerous in hardirq context where IRQs must remain disabled for the
entire duration of the handler.

When lock_release() calls raw_local_irq_restore() inside a hardirq
handler, it briefly re-enables IRQs, creating a window where a new
interrupt can fire before the handler returns. This was observed with
taprio's advance_sched() hrtimer callback - the temporary IRQ
re-enablement inside lock_release() prevented CPU 0 from acknowledging
a pending TLB flush IPI sent by CPU 1 via smp_call_function_many().
CPU 1 then spun indefinitely in csd_lock_wait(), starving the RCU
grace-period kthread and triggering an RCU stall with eventual OOM.

Where exactly is the temporary window when interrupt is enabled during the raw_local_irq_restore() call? Interrupt handling is arch specific. Is it specific to certain architectures?

lock_acquire() already handles the NMI case specially via lockdep_nmi()
to avoid this class of problem. Mirror that pattern for hardirq context
in lock_release() by introducing lockdep_hardirq() and skipping the
irq save/restore dance when called from hardirq context.

Reported-by: syzbot+0635dc2e2c3c21a6aa04@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=0635dc2e2c3c21a6aa04
Signed-off-by: Deepanshu Kartikey <kartikey406@xxxxxxxxx>
---
kernel/locking/lockdep.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2d4c5bab5af8..17eb9590e751 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5872,6 +5872,15 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
}
EXPORT_SYMBOL_GPL(lock_acquire);
+static bool lockdep_hardirq(void)
+{
+ if (raw_cpu_read(lockdep_recursion))
+ return false;
+ if (!in_hardirq())
+ return false;
+ return true;
+}
+

The lockdep_nmi() is for a different use case where we want to save some lockdep information in nmi context and so checking lockdep recursion makes sense. If you want to always disable the irq_save/restore operation in hardirq context, I doubt you want to check for lockdep recursion.

void lock_release(struct lockdep_map *lock, unsigned long ip)
{
unsigned long flags;
@@ -5882,6 +5891,14 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
lock->key == &__lockdep_no_track__))
return;
+ if (lockdep_hardirq()) {
+ lockdep_recursion_inc();
+ if (__lock_release(lock, ip))
+ check_chain_key(current);
+ lockdep_recursion_finish();
+ return;
+ }
+
raw_local_irq_save(flags);
check_flags(flags);

I believe the code change will be easier to understand if you just conditionally disable the irq_save/restore when in hardirq context instead of duplicating the remaining code without irq_save/restore.

Cheers,
Longman