Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking

From: Jinjie Ruan

Date: Thu Mar 26 2026 - 05:02:27 EST

On 2026/3/25 19:03, Mark Rutland wrote:
> On Sun, Mar 22, 2026 at 12:25:06AM +0100, Thomas Gleixner wrote:
>> On Fri, Mar 20 2026 at 17:31, Mark Rutland wrote:
>>> On Fri, Mar 20, 2026 at 05:26:47PM +0100, Thomas Gleixner wrote:
>>> I *think* you're saying that because the arch code would manage DAIF
>>> early during entry and late during exit, that would all be in one place.
>>
>> That was my thought, see below.
>>
>>> However, that doubles the number of times we have to write to DAIF: at
>>> entry we'd have to poke it once to unmask everything except IRQs, then
>>> again to unmask IRQs, and exit would need the inverse. We'd also have to
>>> split the inheritance logic into inherit-everything-but-interrupt and
>>> inherit-only-interrupt, which is doable but not ideal. With pseudo-NMI
>>> it's even worse, but that's largely because pseudo-NMI is
>>> over-complicated today.
>>
>> Interrupts are not unmasked on interrupt/exception entry ever and I
>> don't understand your concerns at all, but as always I might be missing
>> something.
>
> I think one problem is that we're using the same words to describe
> distinct things, because the terminology is overloaded; I've tried to
> clarify some of that below.
>
> I think another is that there are a large number of interacting
> constraints, and it's easily possible to find something that for most
> but not all of those. I *think* there's an approach that satisfies all
> of our requirements; see below where I say "I *think* what would work
> for us ...".
>
> For context, when I said "at entry" and "at exit" I was including
> *everything* we have to do before/after the "real" logic to handle an
> exception, including parts that surround the generic entry code. I'm
> happy to use a different term for those periods, but I can't immediately
> think of something better.
>
> For example, for faults handled by arm64's el1_abort(), I was
> characterising this as:
>
> /* -------- "entry" begins here -------- */
>
> [[ entry asm ]]
> [[ early triage, branch to el1_abort() ]]
>
> static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
> {
> unsigned long far = read_sysreg(far_el1);
> irqentry_state_t state;
>
> state = enter_from_kernel_mode(regs) {
> ...
> irqentry_enter(regs);
> ...
> }
> local_daif_inherit(regs); // <----------- might unmask interrupts
> /* -------- "entry" ends here ---------- */
>
>
> /* -------- "real logic" begins here --- */
> do_mem_abort(far, esr, regs);
> /* -------- "real logic" ends here ----- */
>
>
> /* -------- "exit" begins here --------- */
> local_daif_mask(); // <------------------------- masks interrupts
> exit_to_kernel_mode(regs, state) {
> ...
> irqentry_exit(regs);
> ...
> }
> }
>
> [[ return from el1_abort() ]]
> [[ exit asm ]]
>
> /* -------- "exit" ends here ----------- */
>
> Please note, el1_abort() is indicative of what arm64 does for
> (most) synchronous exceptions, but there are slight differences for
> other exceptions. For anything maskable (including interupts) we DO NOT
> use local_daif_inherit() and instead unmask specific higher-priority
> maskable exceptions via other functions that write to DAIF, etc.
>
> Interrupts are an odd middle ground where we use irqentry_{enter,exit}()
> but do not use local_daif_inherit().
>
>> The current sequence on entry is:
>>
>> // interrupts are disabled by interrupt/exception entry
>> enter_from_kernel_mode()
>> irqentry_enter(regs);
>> mte_check_tfsr_entry();
>> mte_disable_tco_entry();
>> daif_inherit(regs);
>> // interrupts are still disabled
>
> That last comment isn't quite right: we CAN and WILL enable interrupts
> in local_daif_inherit(), if and only if they were enabled in the context
> the exception was taken from.
>
> As mentioned above, when handling an interrupt (rather than a
> synchronous exception), we don't use local_daif_inherit(), and instead
> use a different DAIF function to unmask everything except interrupts.
>
>> which then becomes:
>>
>> // interrupts are disabled by interrupt/exception entry
>> irqentry_enter(regs)
>> establish_state();
>> // RCU is watching
>> arch_irqentry_enter_rcu()
>> mte_check_tfsr_entry();
>> mte_disable_tco_entry();
>> daif_inherit(regs);
>> // interrupts are still disabled
>>
>> Which is equivalent versus the MTE/DAIF requirements, no?
>
> As above, we can't use local_daif_inherit() here because we want
> different DAIF masking behavior for entry to interrupts and entry to
> synchronous exceptions. While we could pass some token around to
> determine the behaviour dynamically, that's less clear, more
> complicated, and results in worse code being generated for something we
> know at compile time.
>
> If we can leave DAIF masked early on during irqentry_enter(), I don't
> see why we can't leave all DAIF exceptions masked until the end of
> irqentry_enter().
>
> I *think* what would work for us is we could split some of the exit
> handling (including involuntary preemption) into a "prepare" step, as we
> have for return to userspace. That way, arm64 could handle exiting
> something like:
>
> local_irq_disable();
> irqentry_exit_prepare(); // new, all generic logic
> local_daif_mask();
> arm64_exit_to_kernel_mode() {
> ...
> irqentry_exit(); // ideally irqentry_exit_to_kernel_mode().
> ...
> }

I agree. Having a symmetric 'prepare' step for kernel-mode exit as we do
for userspace would be much cleaner. It effectively addresses the DAIF
masking constraints on arm64.

arm64_exit_to_user_mode(struct pt_regs *regs)
-> local_irq_disable(); // only mask irqs
^^^^^^^^^^^^^^^^^^^^^^
-> exit_to_user_mode_prepare_legacy(regs);
-> schedule() // schedule if need resched

-> local_daif_mask(); // set daif to mask all exceptions
^^^^^^^^^^^^^^^^^^^^^
-> mte_check_tfsr_exit();
-> exit_to_user_mode();

This approach can also align with generic implementations like RISC-V.
We can split irqentry_exit() into two sub-functions:
irqentry_exit_prepare() to handle scheduling-related tasks, and the
remaining logic in a simplified irqentry_exit() (or a specific helper).
This way, arm64 can call these two helpers with the DAIF masking
operation inserted in between, while architectures like RISC-V can
continue to use the full irqentry_exit() functionality as they do now.

void do_page_fault(struct pt_regs *regs)
-> irqentry_state_t state = irqentry_enter(regs);
-> handle_page_fault(regs);
-> local_irq_disable();
-> irqentry_exit(regs, state);

>
> ... and other architectures can use a combined exit_to_kernel_mode() (or
> whatever we call that), which does both, e.g.
>
> // either noinstr, __always_inline, or a macro
> void irqentry_prepare_and_exit(void)
> {
> irqentry_exit_prepare();
> irqentry_exit();
> }
>
> That way:
>
> * There's a clear separation between the "prepare" and subsequent exit
> steps, which minimizes the risk that a change subtly breaks arm64's
> exception masking.
>
> * This would mirror the userspace return path, and so would be more
> consistent over all.
>
> * All of arm64's arch-specific exception masking can live in
> arch/arm64/kernel/entry-common.c, explicit in the straight line code
> rather than partially hidden behind arch_*() callbacks.
>
> * There's no unnecessary cost to other architectures.
>
> * There's no/minimal maintenance cost for the generic code. There are no
> arch_*() callbacks, and we'd have to enforce ordering between the
> prepare/exit steps anyhow...
>
> If you don't see an obvious problem with that, I will go and try that
> now.
>
>> The current broken sequence vs. preemption on exit is:
>>
>> // interrupts are disabled
>> exit_to_kernel_mode
>> daif_mask();
>> mte_check_tfsr_exit();
>> irqentry_exit(regs, state);
>>
>> which then becomes:
>>
>> // interrupts are disabled
>> irqentry_exit(regs, state)
>> // includes preemption
>> prepare_for_exit();
>>
>> // RCU is still watching
>> arch_irqentry_ecit_rcu()
>> daif_mask();
>> mte_check_tfsr_exit();
>>
>> if (state.exit_rcu)
>> ct_irq_exit();
>
> As above, I'd strongly prefer if we could pull the "prepare" step out of
> irqentry_exit(). Especially since for the entry path we can't push the
> DAIF masking into irqentry_enter(), and I'd very strongly prefer that
> the masking and unmasking occur in the same logical place, rather than
> having one of those hidden behind an arch_() callback.
>
>> Which is equivalent versus the MTE/DAIF requirements and fixes the
>> preempt on exit issue too, no?
>>
>> That change would be trivial enough for backporting, right?
>>
>> It also prevents you from staring at the bug reports which are going to
>> end up in your mailbox after I merged the patch which moves the
>> misplaced rcu_irq_exit_check_preempt() check _before_ the
>> preempt_count() check where it belongs.
>
> I intend to fix that issue, so hopefully I'm not staring at those for
> long.
>
> Just to check, do you mean that you've already queued that (I didn't
> spot it in tip), or that you intend to? I'll happily test/review/ack a
> patch adding that, but hopefully we can fix arm64 first.
>
>> I fully agree that ARM64 is special vs. CPU state handling, but it's not
>> special enough to justify it's own semantically broken preemption logic.
>
> Sure. To be clear, I'm not arguing for broken preemption logic. I'd
> asked those initial two questions because I suspected this approach
> wasn't quite right.
>
> As above, I think we can solve this in an actually generic way by
> splitting out a "prepare to exit" step, and still keep the bulk of the
> logic generic.
>
>> Looking at those details made me also look at this magic
>> arch_irqentry_exit_need_resched() inline function.
>
> I see per your other reply that you figured out this part was ok:
>
> https://lore.kernel.org/linux-arm-kernel/87se9ph129.ffs@tglx/
>
> ... though I agree we can clean that up further.
>
> Mark.
>