Re: Cleaning up numbering for new x86 syscalls?

From: Arnd Bergmann
Date: Wed Nov 21 2018 - 12:14:42 EST


On Tue, Nov 20, 2018 at 1:25 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> Hi all-
>
> We currently have some giant turds in the way that syscalls are
> numbered. We have the x86_32 table, which is totally sane other than
> some legacy multiplexers. Then we have the x86_64 table, which is,
> um, demented:
>
> - The numbers don't match x86_32. I have no idea why.

I think it was an early attempt at cleanup up the table, and only
adding those that were still used. Back in the days, each architecture
had its own table, and of course they started out as separate
top-level architectures.

> - We use bit 30, which triggers in_x32_syscall(). It should have
> been bit 31, bit I digress.
>
> - We have this weird set of extra x32 syscalls that start at 512.
> Who wants to bet whether we have no bugs if someone does syscall with,
> say, nr == 512 (i.e. not 512 | BIT(30)) or nr == (16 | BIT(30))? The
> latter would be non-compat ioctl with in_x32_syscall() set and hence
> in_compat_syscall() set.

The comment in the table says it's purely for keeping the calls
in separate cache lines. I don't know if the cache lines make
a difference in the end, but it seems that once we start running
into the x32 syscall numbers, I think we just treat them like any
others, we just choose to never call them from a 64-bit glibc.

> I propose we consider some subset of the following:
>
> 1. Introduce restart_syscall_2(). Make its number be 1024. Maybe
> someday we could start using it instead of restart_syscall(). The
> only issue I can see is programs that allow restart_syscall() using
> seccomp but don't allow the new variant.
>
> 2. Introduce an outright ban on new syscalls with nr < 1024.

This would leave a hole of several hundred numbers if we do it
for all architectures. Wasting multiple kilobytes for a cosmetic
cleanup might be considered excessive.

> 3. Introduce an outright ban on the addition of new __x32_compat
> syscalls. If new compat hacks are needed, they can use
> in_compat_syscall(), thank you very much.

I would definitely want to keep anything regarding x32 out of the
common syscall implementation. If you want to add on to that
pile, please do it in arch/x86, not in kernel/ or fs/.

If we decide that x32 is a failed experiment and we don't keep
it working in the future, let's just kill it off right away. I'm fairly
sure nobody depends on it for anything real, the only users I
could find are either for showing off benchmark results or for
playing around with it for fun. Most of that fun part has apparently
ended many years ago, but there is still some work going into
debian/x32. We probably need to coordinate with them and see
if they know of actual users before removing it. Popcon lists
5 active users [1] and a sharp downward trend.

> 4. Modify the wrappers of the __x32_compat entries so that they will
> return -ENOSYS if in_x32_syscall() returns false.

No objection here, but what would that help?

> 5. Adjust the scripts so that we only have to wire up new syscalls
> once. They'll have a nr above 1024, and they'll have the same nr on
> all x86 variants.
>
> Thoughts?

I would definitely welcome assigning the same syscall numbers across
all architectures. It is a needless burden for the libc developers to
figure out for each syscall which kernel is known to support it.
When a call gets added, they typically add logic to check for the
system call at runtime, but for older syscalls, it helps to know when
all architectures support it once the minimum kernel version for
a libc has been raised beyond that.

Please see also the work that Firoz Khan has been posting
for generalizing the tables on all architectures to use the
format we have on x86, arm and s390. I hope we can merge it
all for 4.21, and then build on top of that for generalization and
cleanups.

Arnd

[1] https://popcon.debian.org/stat/sub-x32.png