Re: [PATCH v2 1/3] syscall_user_dispatch: Allow allowed range wrap-around

From: Thomas Gleixner
Date: Sat Mar 08 2025 - 06:19:26 EST


On Mon, Feb 24 2025 at 09:45, Dmitry Vyukov wrote:
> There are two possible scenarios for syscall filtering:
> - having a trusted/allowed range of PCs, and intercepting everything else
> - or the opposite: a single untrusted/intercepted range and allowing
> everything else
> The current implementation only allows the former use case due to
> allowed range wrap-around check. Allow the latter use case as well
> by removing the wrap-around check.
> The latter use case is relevant for any kind of sandboxing scenario,
> or monitoring behavior of a single library. If a program wants to
> intercept syscalls for PC range [START, END) then it needs to call:
> prctl(..., END, -(END-START), ...);
> which sets a wrap-around range that excludes everything
> besides [START, END).

That's not really intuitive and the implementation changes the prctl()
behaviour in a non backwards compatible way.

Can we please keep the current behaviour and have a new mode. Something
like:

# define PR_SYS_DISPATCH_OFF 0
# define PR_SYS_DISPATCH_ON 1
# define PR_SYS_DISPATCH_EXCLUSIVE_ON PR_SYS_DISPATCH_ON
# define PR_SYS_DISPATCH_INCLUSIVE_ON 2

That keeps the current mode backwards compatible and avoids the oddity of

prctl(..., END, -(END-START), ...);

i.e. this is clearly and obvious distinguishable for user space:

prctl(..., PR_SYS_DISPATCH_EXCLUSIVE_ON, END, END - START, ...);
prctl(..., PR_SYS_DISPATCH_INCLUSIVE_ON, END, END - START, ...);

Which makes a lot of sense because these two modes are distinctly
different, no?

PR_SYS_DISPATCH_INCLUSIVE_ON will fail on older kernels and both modes
have a sanity check. PR_SYS_DISPATCH_INCLUSIVE_ON should at least check
for a zero length dispatcher region.

Aside of the better user interface this avoids the in_compat_syscall()
hack. Because then set_syscall_user_dispatch() does the range inversion
and that works completely independent of compat.

> kernel/entry/syscall_user_dispatch.c | 9 +++------
> kernel/sys.c | 6 ++++++
> 2 files changed, 9 insertions(+), 6 deletions(-)

This clearly lacks an update of

Documentation/admin-guide/syscall-user-dispatch.rst

Thanks,

tglx