Re: [PATCH v2] unwind: Add sframe_(un)register() system calls

From: Andrii Nakryiko

Date: Thu May 28 2026 - 19:01:39 EST


On Thu, May 28, 2026 at 12:09 PM Steven Rostedt <rostedt@xxxxxxxxxx> wrote:
>
> From: Steven Rostedt <rostedt@xxxxxxxxxxx>
>
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.
>
> Both system calls take a pointer to a new structure:
>
> struct sframe_setup {
> __u64 sframe_start;
> __u64 sframe_size;
> __u64 text_start;
> __u64 text_size;
> };
>
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
>
> sframe_start - The virtual address of the sframe section
> sframe_size - The length of the sframe section
> text_start - the text section the sframe represents
> test_size - the length of the section
>
> If other stack tracing functionality is added, it will require a new
> system call.
>
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
>
> Also added a DEFINE_GUARD() for mmap_write_lock. There was one for
> mmap_read_lock but not for mmap_write_lock.
>
> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> ---
>
> Changes since v1: https://patch.msgid.link/20260521183532.7a145c8a@xxxxxxxxxxxxxxxxxx
>
> - Use mmap_write_lock() instead of mmap_read_lock() for mutual
> exclusiveness. (Jens Remus)
>
> - Guard mtree_insert_range() with mmap_write_lock. (Jens Remus)
>
> - Added a guard for mmap_write_lock() similar to the one for mmap_read_lock.
>
> - Have syscall prototype use structure pointer instead of void (Thomas Weißschuh)
>
> - Use __u64 instead of unsigned long for struct members (Thomas Weißschuh)
>
> - Use size_t instead of int for structure size in syscall argument.
> (Thomas Weißschuh)
>
> arch/alpha/kernel/syscalls/syscall.tbl | 2 +
> arch/arm/tools/syscall.tbl | 2 +
> arch/arm64/tools/syscall_32.tbl | 2 +
> arch/m68k/kernel/syscalls/syscall.tbl | 2 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
> arch/parisc/kernel/syscalls/syscall.tbl | 2 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
> arch/s390/kernel/syscalls/syscall.tbl | 3 +
> arch/sh/kernel/syscalls/syscall.tbl | 2 +
> arch/sparc/kernel/syscalls/syscall.tbl | 2 +
> arch/x86/entry/syscalls/syscall_32.tbl | 2 +
> arch/x86/entry/syscalls/syscall_64.tbl | 2 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
> include/linux/mmap_lock.h | 3 +
> include/linux/syscalls.h | 3 +
> include/uapi/asm-generic/unistd.h | 7 ++-
> include/uapi/linux/sframe.h | 12 ++++
> kernel/sys_ni.c | 3 +
> kernel/unwind/sframe.c | 69 +++++++++++++++++++--
> scripts/syscall.tbl | 2 +
> 23 files changed, 126 insertions(+), 6 deletions(-)
> create mode 100644 include/uapi/linux/sframe.h
>

[...]

> * Architecture-specific system calls
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index a627acc8fb5f..17042d7e5e87 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -863,8 +863,13 @@ __SYSCALL(__NR_listns, sys_listns)
> #define __NR_rseq_slice_yield 471
> __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
>
> +#define __NR_sframe_register 472
> +__SYSCALL(__NR_sframe_register, sys_sframe_register)
> +#define __NR_sframe_unregister 473
> +__SYSCALL(__NR_sframe_unregister, sys_sframe_unregister)
> +
> #undef __NR_syscalls
> -#define __NR_syscalls 472
> +#define __NR_syscalls 474
>
> /*
> * 32 bit systems traditionally used different
> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> new file mode 100644
> index 000000000000..d3c9f88b024b
> --- /dev/null
> +++ b/include/uapi/linux/sframe.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {

I'd add `u64 flags;` field for easier and nicer extensibility. Check
in the kernel that it is set to zero, future kernels will allow some
of the bits to be set.

And I still think that prctl() instead of a separate sframe-specific
syscall is the way to go. I see no reason for sframe-specific set of
syscalls just to set a bit of extra metadata for the entire process.
That seems to be the job of prctl().

> + __u64 sframe_start;
> + __u64 sframe_size;
> + __u64 text_start;
> + __u64 text_size;
> +};
> +

[...]

> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, struct sframe_setup __user *, data, size_t, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;

This seems overly aggressive. It seems like the pattern is to allow
sizes both smaller and bigger:
- if user-provided size is smaller than what kernel knows about,
treat missing fields as zeroes
- if user-provided size is bigger, then check that space after
fields that kernel recognizes are all zeroes.

This allows extensibility without having to change user space code all
the time. Old code will provide smaller struct without new (presumably
optional) fields, while newer code can use newer and larger struct
size, but as long as it clears extra fields old kernel will be fine
with that.

> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + return sframe_add_section(sframe.sframe_start,
> + sframe.sframe_start + sframe.sframe_size,
> + sframe.text_start,
> + sframe.text_start + sframe.text_size);
> +}
> +

[...]