Re: [PATCH v15 07/11] arm64: syscall: Introduce syscall_exit_to_user_mode_work()

From: Jinjie Ruan

Date: Thu Jun 25 2026 - 05:23:50 EST




On 6/24/2026 10:37 PM, Ada Couprie Diaz wrote:
> On 11/05/2026 10:20, Jinjie Ruan wrote:
>> Refactor the system call exit path to align with the generic entry
>> framework. This consolidates thread flag checking, rseq handling, and
>> syscall tracing into a structure that mirrors the generic
>> syscall_exit_to_user_mode_work() implementation.
>>
>> [Rationale]
>> The generic entry code employs a hierarchical approach for
>> syscall exit work:
>>
>> 1. syscall_exit_to_user_mode_work(): The entry point that handles
>>     rseq and checks if further exit work (tracing/audit) is required.
>>
>> 2. syscall_exit_work(): Performs the actual tracing, auditing, and
>>     ptrace reporting.
>>
>> [Changes]
>> - Rename and Encapsulate: Rename syscall_trace_exit() to
>>    syscall_exit_work() and make it static, as it is now an internal
>>    helper for the exit path.
>>
>> - New Entry Point: Implement syscall_exit_to_user_mode_work() to
>>    replace the manual flag-reading logic in el0_svc_common(). This
>>    function now encapsulates the rseq_syscall() call and the
>>    conditional execution of syscall_exit_work().
>>
>> - Simplify el0_svc_common(): Remove the complex conditional checks
>>    for tracing and CONFIG_DEBUG_RSEQ at the end of the syscall path,
>>    delegating this responsibility to the new helper.
> It is indeed simpler, however to me there are two changes to the behaviour,
> which are not called out (apologies if I missed some prior discussion
> when I looked for some) :
> 1. As pointed by the removed comment, in mainline we *always* trace on exit
>    if we traced on entry. This is why there are two `has_syscall_work()`
> checks
>    on exit, with a re-read of the flags after syscall execution in between.
>    This change only checks once on exit after updating the flags, so if
>    there was work on entry but the flags got cleared, it *won't* trace
> on exit.
>    Is this desired ? Can this change of behaviour have an impact ?

Hi, Ada,

After rework, `syscall_exit_to_user_mode_work()` will be executed
unconditionally, regardless of whether the conditions below evaluate to
true or false. You can see how this is handled in the finer-grained
refactoring split which will be shown in v16.

if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ))

>> - Helper Migration: Move has_syscall_work() to asm/syscall.h
>>    to allow its reuse across ptrace.c and syscall.c.
>>
>> - Clean up RSEQ: Remove the explicit IS_ENABLED(CONFIG_DEBUG_RSEQ)
>>    check in the caller, as rseq_syscall() is already a no-op when the
>>    config is disabled.
> 2. `rseq_syscall()` is indeed a no-op, but removing the explicit check here
>    does change the behaviour : in mainline we *always* trace on exit if
>    `CONFIG_DEBUG_RSEQ` is enabled, bypassing the `has_syscall_work()`
> check.
>    This change does not bypass the `has_syscall_work()` check if
>    `CONFIG_DEBUG_RSEQ` is enabled, so there might be a change of behaviour.
>    Same questions as above : is this change desired ? Can it have an
> impact ?

This should not introduce any functional changes.

Except for "audit", the internal code execution of
`syscall_trace_exit()` is gated by the "_TIF_SYSCALL_TRACEPOINT,
_TIF_SYSCALL_TRACE, or _TIF_SINGLESTEP" TIF flags.

And gating audit_syscall_exit() behind `_TIF_SYSCALL_AUDIT` introduces
no functional changes.

The `SYSCALL_AUDIT` flag and its context are
statically allocated via audit_alloc() at fork and only freed via
audit_free() at do_exit(). Since the flag remains persistent and static
throughout syscall execution, checking the `_TIF_SYSCALL_AUDIT` flag is
completely equivalent to evaluating audit_context() in
audit_syscall_exit().

I probably moved too fast with this refactoring. I'll split this into
smaller, more granular steps in v16 to make the logic clearer and easier
to follow."

>
> I understand that the change is to align with the generic entry, but it
> seems
> like this could have an impact that I do not really understand, so I prefer
> asking !
>
> Apart from the above everything looks OK to me, but I'd like
> some confirmation that the change of behaviours either do not exist or
> are OK !

Thank you for the review.

>
> Thanks,
> Ada
>
>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>> Cc: Will Deacon <will@xxxxxxxxxx>
>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>> Reviewed-by: Linus Walleij <linusw@xxxxxxxxxx>
>> Reviewed-by: Yeoreum Yun <yeoreum.yun@xxxxxxx>
>> Reviewed-by: Kevin Brodsky <kevin.brodsky@xxxxxxx>
>> Signed-off-by: Jinjie Ruan <ruanjinjie@xxxxxxxxxx>
>> ---
>> v15
>> - Make syscall_exit_to_user_mode_work() __always_inline to keep
>>    the fast-path performance as Sashiko pointed out.
>> ---
>>   arch/arm64/include/asm/syscall.h | 18 +++++++++++++++++-
>>   arch/arm64/kernel/ptrace.c       |  5 +----
>>   arch/arm64/kernel/syscall.c      | 20 +-------------------
>>   3 files changed, 19 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/
>> asm/syscall.h
>> index 30b203ef156b..b331e09b937f 100644
>> --- a/arch/arm64/include/asm/syscall.h
>> +++ b/arch/arm64/include/asm/syscall.h
>> @@ -8,6 +8,7 @@
>>   #include <uapi/linux/audit.h>
>>   #include <linux/compat.h>
>>   #include <linux/err.h>
>> +#include <linux/rseq.h>
>>     typedef long (*syscall_fn_t)(const struct pt_regs *regs);
>>   @@ -121,6 +122,21 @@ static inline int syscall_get_arch(struct
>> task_struct *task)
>>   }
>>     int syscall_trace_enter(struct pt_regs *regs, unsigned long flags);
>> -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags);
>> +void syscall_exit_work(struct pt_regs *regs, unsigned long flags);
>> +
>> +static inline bool has_syscall_work(unsigned long flags)
>> +{
>> +    return unlikely(flags & _TIF_SYSCALL_WORK);
>> +}
>> +
>> +static __always_inline void syscall_exit_to_user_mode_work(struct
>> pt_regs *regs)
>> +{
>> +    unsigned long flags = read_thread_flags();
>
>             ^-- This only reflects the post-syscall flags
>
>> +
>> +    rseq_syscall(regs);
>> +
>> +    if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP)
>> +        syscall_exit_work(regs, flags);
>> +}
>>     #endif    /* __ASM_SYSCALL_H */
>> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
>> index 15a45eeb56da..256aa20377e1 100644
>> --- a/arch/arm64/kernel/ptrace.c
>> +++ b/arch/arm64/kernel/ptrace.c
>> @@ -28,7 +28,6 @@
>>   #include <linux/hw_breakpoint.h>
>>   #include <linux/regset.h>
>>   #include <linux/elf.h>
>> -#include <linux/rseq.h>
>>     #include <asm/compat.h>
>>   #include <asm/cpufeature.h>
>> @@ -2454,10 +2453,8 @@ int syscall_trace_enter(struct pt_regs *regs,
>> unsigned long flags)
>>       return syscall;
>>   }
>>   -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
>> +void syscall_exit_work(struct pt_regs *regs, unsigned long flags)
>>   {
>> -    rseq_syscall(regs);
>> -
>>       audit_syscall_exit(regs);
>
>      ^-- This was always called if entry had work or CONFIG_DEBUG_RSEQ
> was enabled,
>          which is not the case anymore (same for the rest of the function)

As explained above, thank you!

>
>>         if (flags & _TIF_SYSCALL_TRACEPOINT)
>> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
>> index f6f87b042995..dac7bcc4bbdf 100644
>> --- a/arch/arm64/kernel/syscall.c
>> +++ b/arch/arm64/kernel/syscall.c
>> @@ -54,11 +54,6 @@ static void invoke_syscall(struct pt_regs *regs,
>> unsigned int scno,
>>       syscall_set_return_value(current, regs, 0, ret);
>>   }
>>   -static inline bool has_syscall_work(unsigned long flags)
>> -{
>> -    return unlikely(flags & _TIF_SYSCALL_WORK);
>> -}
>> -
>>   static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
>>                  const syscall_fn_t syscall_table[])
>>   {
>> @@ -120,21 +115,8 @@ static void el0_svc_common(struct pt_regs *regs,
>> int scno, int sc_nr,
>>       }
>>         invoke_syscall(regs, scno, sc_nr, syscall_table);
>> -
>> -    /*
>> -     * The tracing status may have changed under our feet, so we have to
>> -     * check again. However, if we were tracing entry, then we always
>> trace
>> -     * exit regardless, as the old entry assembly did.
>> -     */
>> -    if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
>
>                       ^-- We always traced exit if CONFIG_DEBUG_RSEQ is
> enabled
>          ^-- `flags` is unchanged since entry, and exit was always
> traced if there was work.

As explained above, thank you!

Best regards,
Jinjie

>
>> -        flags = read_thread_flags();
>> -        if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP))
>> -            return;
>> -    }
>> -
>>   trace_exit:
>> -    flags = read_thread_flags();
>> -    syscall_trace_exit(regs, flags);
>> +    syscall_exit_to_user_mode_work(regs);
>>   }
>>     void do_el0_svc(struct pt_regs *regs)
>