Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking
From: Mark Rutland
Date: Mon Mar 23 2026 - 13:46:57 EST
On Fri, Mar 20, 2026 at 04:50:03PM +0100, Thomas Gleixner wrote:
> On Fri, Mar 20 2026 at 14:57, Mark Rutland wrote:
> > On Fri, Mar 20, 2026 at 03:11:20PM +0100, Thomas Gleixner wrote:
> >> Yes. It's not an optimization. It's a correctness issue.
> >>
> >> If the interrupted context is RCU idle then you have to carefully go
> >> back to that context. So that the context can tell RCU it is done with
> >> the idle state and RCU has to pay attention again. Otherwise all of this
> >> becomes imbalanced.
> >>
> >> This is about context-level nesting:
> >>
> >> ...
> >> L1.A ct_cpuidle_enter();
> >>
> >> -> interrupt
> >> L2.A ct_irq_enter();
> >> ... // Set NEED_RESCHED
> >> L2.B ct_irq_exit();
> >>
> >> ...
> >> L1.B ct_cpuidle_exit();
> >>
> >> Scheduling between #L2.B and #L1.B makes RCU rightfully upset.
> >
> > I suspect I'm missing something obvious here:
> >
> > * Regardless of nesting, I see that scheduling between L2.B and L1.B is
> > broken because RCU isn't watching.
> >
> > * I'm not sure whether there's a problem with scheduling between L2.A
> > and L2.B, which is what arm64 used to do, and what arm64 would do
> > after this patch.
>
> The only reason why it "works" is that the idle task has preemption
> permanently disabled, so it won't really schedule even if need_resched()
> is set. So it "works" by chance and not by design.
Ah, I see.
Thanks -- that relieves my fear that we'd have to backport a fix to
stable kernels. Since that's safe by accident, I think we can leave
stable kernels as-is.
> Apply the patch below and watch the show.
Thanks for this too; I hadn't spotted rcu_irq_exit_check_preempt().
Info dump below, but this is just agreeing with what you said above. :)
Since rcu_irq_exit_check_preempt() doesn't dump the actual values, I
hacked up something similar and tested arm64's old logic (from v6.17).
CT_NESTING_IRQ_NONIDLE would be 0x4000000000000001, so that would
be off-by-one if we were to preempt. However, as you say, preemption is
disabled, and that happens to save us.
Thanks again!
Mark.
| ------------[ cut here ]------------
| HARK: arm64_preempt_schedule_irq() called with:
| CT nesting: 0x0000000000000001
| CT NMI nesting: 0x4000000000000002
| RCU watching: yes
| preempt_count: 0x00000001
| WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/entry-common.c:286 el1_interrupt+0xf8/0x100
| Modules linked in:
| CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.17.0-00001-gc02e86492f52-dirty #8 PREEMPT
| Hardware name: linux,dummy-virt (DT)
| pstate: 600000c9 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : el1_interrupt+0xf8/0x100
| lr : el1_interrupt+0xf8/0x100
| sp : ffffa1efd4333be0
| x29: ffffa1efd4333be0 x28: ffffa1efd434d280 x27: ffffa1efd4342360
| x26: ffffa1efd4345000 x25: 0000000000000000 x24: ffffa1efd434d280
| x23: 0000000060000009 x22: ffffa1efd31f0154 x21: ffffa1efd4333d70
| x20: 0000000000000000 x19: ffffa1efd4333c20 x18: 000000000000000a
| x17: 72702020200a7365 x16: 79203a676e696863 x15: 7461772055435220
| x14: 2020200a32303030 x13: 3130303030303030 x12: 7830203a746e756f
| x11: 0000000000000058 x10: 0000000000000018 x9 : fff000003c7e5000
| x8 : 00000000000affa8 x7 : 0000000000000084 x6 : fff000003fc7b6c0
| x5 : fff000003fc7b6c0 x4 : 0000000000000000 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffa1efd434d280
| Call trace:
| el1_interrupt+0xf8/0x100 (P)
| el1h_64_irq_handler+0x18/0x24
| el1h_64_irq+0x6c/0x70
| default_idle_call+0xb4/0x2a0 (P)
| do_idle+0x210/0x270
| cpu_startup_entry+0x34/0x40
| rest_init+0x174/0x180
| console_on_rootfs+0x0/0x6c
| __primary_switched+0x88/0x90
| irq event stamp: 848
| hardirqs last enabled at (846): [<ffffa1efd1fb0da8>] rcu_core+0xc88/0x1048
| hardirqs last disabled at (847): [<ffffa1efd1ee2444>] handle_softirqs+0x434/0x4a0
| softirqs last enabled at (848): [<ffffa1efd1ee245c>] handle_softirqs+0x44c/0x4a0
| softirqs last disabled at (841): [<ffffa1efd1e10794>] __do_softirq+0x14/0x20
| ---[ end trace 0000000000000000 ]---