[PATCH RFC 00/28] ARM: Switch to generic entry

From: Linus Walleij
Date: Thu Oct 10 2024 - 07:33:52 EST


This patch series converts a slew of ARM assembly into the
corresponding C code, step by step moving the codebase
closer to the expectations of the generic entry code,
and as a last step switches ARM over to the generic
entry code.

This was inspired by Jinjie Ruans similar work for ARM64.

The low-level assembly calls into arch/arm/kernel/syscall.c
to invoke syscalls from userspace, and to the functions listed
in arch/arm/kernel/entry.c for any other transitions to
and from userspace. Looking at these functions and the
call sites in the assembly on the final result should give
a pretty good idea about how this works, and what the
generic entry expects from an architecture.

To test the code the following seccomp patch is needed
on older ARM systems:
https://lore.kernel.org/lkml/20241008-seccomp-compile-error-v1-1-f87de4007095@xxxxxxxxxx

There is a git branch you can pull in and test (v6.12-rc1
based):
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator.git/log/?h=b4/arm-generic-entry-v6.12-rc1

Upsides:
- Same code paths as x86, S390, RISCV, Loongarch and probably
soon ARM64 is used for the ARM systems. This includes some
instrumentation stubs helping out with things we haven't
even started to look at such as kmsan and live patching (!).

- By introducing the new callbacks to C, we can move away
from the deprecated (and I think partly unmaintained) context
tracking mechanism for RCU (user_exit_callable(),
user_enter_callable()) in favor of what everyone else
is using, i.e. calling rcu_irq_enter_check_tick() on
IRQ entry.

- I think also lockdep is now behaving more according to
expectations (the lockdep calls in ARM64 and generic entry
seems different and more fine-granular from the ARM32 code)
but I am no expert in lockdep so I cannot really tell if
this is a real improvement.

Downsides:

- I had to remove the "fast syscall restart" from Al Viro.
I don't know how much it will affect performance, but
if this is something we must have, let's try to make
the solution generic, i.e. add fast syscall restart in
the generic entry code.

- The "superfast return to userspace" using just very
small assembly snippets to get back to userspace on
e.g. IRQs if and only if no instrumentation was compiled
in, is no longer possible, since we unconditionally
call into code written in C.

Testing:

- Booted into Versatile Express QEMU (ARMv7), Ux500 full
graphic UI (PostmarketOS Phosh, ARMv7 on hardware,
Gemini ARMv4 on hardware. No special issues.

- Tested some ptrace/strace obviously, such as issuing
several instances of "ptrace find /" and let this scroll
by in the terminal over some 10 minutes or so.

- Turned on RCU torture tests and ran for a while. Seems
stable and the test outputs look normal.

- Ran stress-ng, which triggers the idle bug below that also
appear during boot.

- perf top doesn't give any output, I don't really know how
to enble interesting stuff in the kernel to run this
tool. Help needed.

Potential bugs:

- This comes up during boot and stress-ng runs:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:128
ct_kernel_exit+0xf8/0x100
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc1+ #31
Hardware name: ARM-Versatile Express
(...)

It is emitted in kernel/context_tracking.c, ct_kernel_exit():
WARN_ON_ONCE(ct_nmi_nesting() != CT_NESTING_IRQ_NONIDLE);

I don't know exactly what's going on here, but it happens right
after CPU1 is brought online at boot, so there might be some unexpected
nesting of IPI:s happening when CPU1 is brought up?

Open questions:

- Generic entry requires PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP
to be defined. I added them but don't even know what they
do or if generic entry magically adds support for them
(probably not) so I need help here.

- I need Al Viro's input on how to deal with the "fast syscall
restart" that I bluntly deleted, if we need to reincarnate it
in the generic entry or what we shall do here.

- I need to test with an OABI rootfs.

- Performance impact. If this is major I think it's a no-go, we need
to agree on metrics here however and I need suggestions on what
to test with.

Signed-off-by: Linus Walleij <linus.walleij@xxxxxxxxxx>
---
Linus Walleij (28):
ARM: Prepare includes for generic entry
ARM: ptrace: Split report_syscall()
ARM: entry: Skip ret_slow_syscall label
ARM: process: Rewrite ret_from_fork i C
ARM: process: Remove local restart
ARM: entry: Invoke syscalls using C
ARM: entry: Rewrite two asm calls in C
ARM: entry: Move trace entry to C function
ARM: entry: save the syscall sp in thread_info
ARM: entry: move all tracing invocation to C
ARM: entry: Merge the common and trace entry code
ARM: entry: Rename syscall invocation
ARM: entry: Create user_mode_enter/exit
ARM: entry: Drop trace argument from usr_entry macro
ARM: entry: Separate call path for syscall SWI entry
ARM: entry: Drop argument to asm_irqentry macros
ARM: entry: Implement syscall_exit_to_user_mode()
ARM: entry: Drop the superfast ret_fast_syscall
ARM: entry: Remove fast and offset register restore
ARM: entry: Untangle ret_fast_syscall/to_user
ARM: entry: Do not double-call exit functions
ARM: entry: Move work processing to C
ARM: entry: Stop exiting syscalls like IRQs
ARM: entry: Complete syscall and IRQ transition to C
ARM: entry: Create irqentry calls from kernel mode
ARM: entry: Move in-kernel hardirq tracing to C
ARM: entry: Add FIQ/NMI C callbacks
ARM: entry: Convert to generic entry

arch/arm/Kconfig | 1 +
arch/arm/include/asm/entry-common.h | 66 ++++++++++++
arch/arm/include/asm/entry.h | 17 +++
arch/arm/include/asm/ptrace.h | 8 +-
arch/arm/include/asm/signal.h | 4 -
arch/arm/include/asm/stacktrace.h | 2 +-
arch/arm/include/asm/switch_to.h | 4 +
arch/arm/include/asm/syscall.h | 7 ++
arch/arm/include/asm/thread_info.h | 18 +---
arch/arm/include/asm/traps.h | 2 +-
arch/arm/include/uapi/asm/ptrace.h | 2 +
arch/arm/kernel/Makefile | 5 +-
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/kernel/entry-armv.S | 39 +++----
arch/arm/kernel/entry-common.S | 202 ++++++++++++++----------------------
arch/arm/kernel/entry-header.S | 108 +++++--------------
arch/arm/kernel/entry.c | 59 +++++++++++
arch/arm/kernel/process.c | 22 +++-
arch/arm/kernel/ptrace.c | 76 --------------
arch/arm/kernel/signal.c | 57 ++--------
arch/arm/kernel/syscall.c | 31 ++++++
arch/arm/kernel/traps.c | 2 +-
22 files changed, 349 insertions(+), 384 deletions(-)
---
base-commit: e1dc5c87445c608a99e508fe4d3102e2b32858ef
change-id: 20240903-arm-generic-entry-ada145378bbe

Best regards,
--
Linus Walleij <linus.walleij@xxxxxxxxxx>