Re: [PATCH] locking/lockdep: skip irq save/restore in hardirq context in lock_release()

From: Waiman Long

Date: Tue Jun 30 2026 - 00:58:19 EST


On 6/29/26 7:44 PM, Deepanshu Kartikey wrote:
On Mon, Jun 29, 2026 at 10:57 AM Waiman Long <longman@xxxxxxxxxx> wrote:

On 6/29/26 12:11 AM, Deepanshu Kartikey wrote:
lock_release() performs a raw_local_irq_save/restore dance around its
validation work. While safe in process and softirq context, this is
dangerous in hardirq context where IRQs must remain disabled for the
entire duration of the handler.

When lock_release() calls raw_local_irq_restore() inside a hardirq
handler, it briefly re-enables IRQs, creating a window where a new
interrupt can fire before the handler returns. This was observed with
taprio's advance_sched() hrtimer callback - the temporary IRQ
re-enablement inside lock_release() prevented CPU 0 from acknowledging
a pending TLB flush IPI sent by CPU 1 via smp_call_function_many().
CPU 1 then spun indefinitely in csd_lock_wait(), starving the RCU
grace-period kthread and triggering an RCU stall with eventual OOM.
Where exactly is the temporary window when interrupt is enabled during
the raw_local_irq_restore() call? Interrupt handling is arch specific.
Is it specific to certain architectures?
On x86, raw_local_irq_restore() executes the 'sti' instruction which
immediately re-enables IRQs. Inside lock_release(), after the validation
work completes, calling raw_local_irq_restore() with the saved flags
will execute 'sti' even when called from hardirq context.

The window is between the 'sti' instruction (which re-enables IRQs) and
the return from lock_release(). During this window, a new interrupt can
fire and hijack the CPU before the hardirq handler can return and
acknowledge pending IPIs.

In the syzkaller trace, this window allowed a new IRQ to fire on CPU 0
after lock_release()'s sti, preventing CPU 0 from ever acknowledging
the TLB flush IPI sent by CPU 1, causing CPU 1 to spin indefinitely in
csd_lock_wait(), which starved the RCU grace-period kthread.

The fix is correct on all architectures - hardirq context must never
restore IRQs mid-handler since the hardware manages IRQ state for
interrupt entry/exit. This is why we conditionally skip the
irq_save/restore dance when in_hardirq() is true.

I looked at the generated code of raw_local_irq_restore():

./arch/x86/include/asm/irqflags.h:
146        return !(flags & X86_EFLAGS_IF);
   0x00000000000082b9 <+9>:    test   $0x200,%edi
   0x00000000000082bf <+15>:    je     0x82c2 <cpuset_test+18>

42        asm volatile("sti": : :"memory");
   0x00000000000082c1 <+17>:    sti

kernel/cgroup/cpuset.c:
4553    }
   0x00000000000082c2 <+18>:    jmp    0x82c7

sti should only be called if the saved flags has the IF bit set. In hardirq context, the IF bit shouldn't be set. Is my interpretation correct?

Regards,
Longman