[PATCH v4 0/6] x86/syscall: use int for x86-64 system calls
From: H. Peter Anvin
Date: Tue May 18 2021 - 15:13:55 EST
From: "H. Peter Anvin (Intel)" <hpa@xxxxxxxxx>
This patchset addresses several inconsistencies in the handling of
system call numbers in x86-64 (and x32).
Right now, *some* code will treat e.g. 0x00000001_00000001 as a system
call and some will not. Some of the code, notably in ptrace and
seccomp, will treat 0x00000001_ffffffff as a system call and some will
not.
Furthermore, right now, e.g. 335 for x86-64 will force the exit code
to be set to -ENOSYS even if poked by ptrace, but 548 will not,
because there is an observable difference between an out of range
system call and a system call number that falls outside the range of
the tables.
Both of these issues are visible to the user; for example the
syscall_numbering_64 kernel selftest fails if run under ptrace for
this reason (system calls succeed with the high bits set, whereas they
fail when not being traced.)
The architecture independent code in Linux expects "int" for the
system call number, per the API documented, but not implemented, in
<asm-generic/syscalls.h>: system call numbers are expected to be
"int", with -1 as the only non-system-call sentinel.
Treating the same data in multiple ways in different context is at the
very best confusing, but it also has the potential to cause security
problems (no such security problems are known at this time, however.)
This is an ABI change, but it is in fact a return to the original
x86-64 ABI: the original assembly entry code would zero-extend the
system call number passed and only the bottom 32 bits were examined.
1. Consistently treat the system call number as a signed int. This is
what syscall_get_nr() already does, and therefore what all
architecture-independent code (e.g. seccomp) already expects.
2. As per the defined semantics of syscall_get_nr(), only the value -1
is defined as a non-system call, so comparing >= 0 is
incorrect. Change to != -1.
3. Call sys_ni_syscall() for system calls which are out of range
except for -1, which is used by ptrace and seccomp as a "skip
system call" marker) just as for system call numbers that
correspond to holes in the table.
4. Updates and extends the syscall_numbering_64 selftest, including
testing the system call numbering when running under ptrace.
Changes from v3:
* Reorganize the patchset to have the selftest change first.
* Add tests running under ptrace to selftest.
Changes from v2:
* Factor out and split what was a single patch in the v2 patchset; the
rest of the patches have already been applied.
* Fix the syscall_numbering_64 selftest to match the definition
changes, make its output more informative, and extend it to more
tests. Avoid using the glibc syscall() wrapper to make sure we test
what we think we are testing.
* Better documentation of the changes.
Changes from v1:
* Only -1 should be a non-system call per the cross-architectural
definition of sys_ni_syscall().
* Fix/improve patch descriptions.
---
arch/x86/entry/common.c | 93 +++--
arch/x86/entry/entry_64.S | 2 +-
arch/x86/include/asm/syscall.h | 2 +-
tools/testing/selftests/x86/syscall_numbering.c | 488 +++++++++++++++++++++---
4 files changed, 508 insertions(+), 77 deletions(-)