Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking

From: Jinjie Ruan

Date: Mon Mar 23 2026 - 23:17:29 EST




On 2026/3/20 19:30, Mark Rutland wrote:
> On arm64, involuntary kernel preemption has been subtly broken since the
> move to the generic irq entry code. When preemption occurs, the new task
> may run with SError and Debug exceptions masked unexpectedly, leading to
> a loss of RAS events, breakpoints, watchpoints, and single-step
> exceptions.

We can also add a check in arch_irqentry_exit_need_resched to prevent
schedule-out when the DA bit is set.

>
> We can fix this relatively simply by moving the preemption logic out of
> irqentry_exit(), which is desirable for a number of other reasons on
> arm64. Context and rationale below:
>
> 1) Architecturally, several groups of exceptions can be masked
> independently, including 'Debug', 'SError', 'IRQ', and 'FIQ', whose
> mask bits can be read/written via the 'DAIF' register.
>
> Other mask bits exist, including 'PM' and 'AllInt', which we will
> need to use in future (e.g. for architectural NMI support).
>
> The entry code needs to manipulate all of these, but the generic
> entry code only knows about interrupts (which means both IRQ and FIQ
> on arm64), and the other exception masks aren't generic.
>
> 2) Architecturally, all maskable exceptions MUST be masked during
> exception entry and exception return.
>
> Upon exception entry, hardware places exception context into
> exception registers (e.g. the PC is saved into ELR_ELx). Upon
> exception return, hardware restores exception context from those
> exception registers (e.g. the PC is restored from ELR_ELx).
>
> To ensure the exception registers aren't clobbered by recursive
> exceptions, all maskable exceptions must be masked early during entry
> and late during exit. Hardware masks all maskable exceptions
> automatically at exception entry. Software must unmask these as
> required, and must mask them prior to exception return.
>
> 3) Architecturally, hardware masks all maskable exceptions upon any
> exception entry. A synchronous exception (e.g. a fault on a memory
> access) can be taken from any context (e.g. where IRQ+FIQ might be
> masked), and the entry code must explicitly 'inherit' the unmasking
> from the original context by reading the exception registers (e.g.
> SPSR_ELx) and writing to DAIF, etc.
>
> 4) When 'pseudo-NMI' is used, Linux masks interrupts via a combination
> of DAIF and the 'PMR' priority mask register. At entry and exit,
> interrupts must be masked via DAIF, but most kernel code will
> mask/unmask regular interrupts using PMR (e.g. in local_irq_save()
> and local_irq_restore()).
>
> This requires more complicated transitions at entry and exit. Early
> during entry or late during return, interrupts are masked via DAIF,
> and kernel code which manipulates PMR to mask/unmask interrupts will
> not function correctly in this state.
>
> This also requires fairly complicated management of DAIF and PMR when
> handling interrupts, and arm64 has special logic to avoid preempting
> from pseudo-NMIs which currently lives in
> arch_irqentry_exit_need_resched().
>
> 5) Most kernel code runs with all exceptions unmasked. When scheduling,
> only interrupts should be masked (by PMR pseudo-NMI is used, and by
> DAIF otherwise).
>
> For most exceptions, arm64's entry code has a sequence similar to that
> of el1_abort(), which is used for faults:
>
> | static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
> | {
> | unsigned long far = read_sysreg(far_el1);
> | irqentry_state_t state;
> |
> | state = enter_from_kernel_mode(regs);
> | local_daif_inherit(regs);
> | do_mem_abort(far, esr, regs);
> | local_daif_mask();
> | exit_to_kernel_mode(regs, state);
> | }
>
> ... where enter_from_kernel_mode() and exit_to_kernel_mode() are
> wrappers around irqentry_enter() and irqentry_exit() which perform
> additional arm64-specific entry/exit logic.
>
> Currently, the generic irq entry code will attempt to preempt from any
> exception under irqentry_exit() where interrupts were unmasked in the
> original context. As arm64's entry code will have already masked
> exceptions via DAIF, this results in the problems described above.
>
> Fix this by opting out of preemption in irqentry_exit(), and restoring
> arm64's old behaivour of explicitly preempting when returning from IRQ
> or FIQ, before calling exit_to_kernel_mode() / irqentry_exit(). This
> ensures that preemption occurs when only interrupts are masked, and
> where that masking is compatible with most kernel code (e.g. using PMR
> when pseudo-NMI is in use).
>
> Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()")
> Reported-by: Ada Couprie Diaz <ada.coupriediaz@xxxxxxx>
> Reported-by: Vladimir Murzin <vladimir.murzin@xxxxxxx>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Jinjie Ruan <ruanjinjie@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> ---
> arch/Kconfig | 3 +++
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/entry-common.c | 2 ++
> kernel/entry/common.c | 4 +++-
> 4 files changed, 9 insertions(+), 1 deletion(-)
>
> Thomas, Peter, I have a couple of things I'd like to check:
>
> (1) The generic irq entry code will preempt from any exception (e.g. a
> synchronous fault) where interrupts were unmasked in the original
> context. Is that intentional/necessary, or was that just the way the
> x86 code happened to be implemented?
>
> I assume that it'd be fine if arm64 only preempted from true
> interrupts, but if that was intentional/necessary I can go rework
> this.
>
> (2) The generic irq entry code only preempts when RCU was watching in
> the original context. IIUC that's just to avoid preempting from the
> idle thread. Is it functionally necessary to avoid that, or is that
> just an optimization?
>
> I'm asking because historically arm64 didn't check that, and I
> haven't bothered checking here. I don't know whether we have a
> latent functional bug.
>
> Mark.
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 102ddbd4298ef..c8c99cd955281 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -102,6 +102,9 @@ config HOTPLUG_PARALLEL
> bool
> select HOTPLUG_SPLIT_STARTUP
>
> +config ARCH_HAS_OWN_IRQ_PREEMPTION
> + bool
> +
> config GENERIC_IRQ_ENTRY
> bool
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 38dba5f7e4d2d..bf0ec8237de45 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -42,6 +42,7 @@ config ARM64
> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
> + select ARCH_HAS_OWN_IRQ_PREEMPTION
> select ARCH_HAS_PREEMPT_LAZY
> select ARCH_HAS_PTDUMP
> select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3625797e9ee8f..1aedadf09eb4d 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -497,6 +497,8 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> do_interrupt_handler(regs, handler);
> irq_exit_rcu();
>
> + irqentry_exit_cond_resched();
> +
> exit_to_kernel_mode(regs, state);
> }
> static void noinstr el1_interrupt(struct pt_regs *regs,
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 9ef63e4147913..af9cae1f225e3 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -235,8 +235,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
> }
>
> instrumentation_begin();
> - if (IS_ENABLED(CONFIG_PREEMPTION))
> + if (IS_ENABLED(CONFIG_PREEMPTION) &&
> + !IS_ENABLED(CONFIG_ARCH_HAS_OWN_IRQ_PREEMPTION)) {
> irqentry_exit_cond_resched();
> + }
>
> /* Covers both tracing and lockdep */
> trace_hardirqs_on();