[PATCH v4 4/6] x86/syscall: sign-extend system calls on entry to int

From: H. Peter Anvin
Date: Tue May 18 2021 - 15:20:02 EST


From: "H. Peter Anvin (Intel)" <hpa@xxxxxxxxx>

Right now, *some* code will treat e.g. 0x0000000100000001 as a system
call and some will not. Some of the code, notably in ptrace, will
treat 0x000000018000000 as a system call and some will not. Finally,
right now, e.g. 335 for x86-64 will force the exit code to be set to
-ENOSYS even if poked by ptrace, but 548 will not, because there is an
observable difference between an out of range system call and a system
call number that falls outside the range of the table.

This is visible to the user: for example, the syscall_numbering_64
test fails if run under strace, because as strace uses ptrace, it ends
up clobbering the upper half of the 64-bit system call number.

The arch-independent code all assumes that a system call is "int" that
the value -1 specifically and not just any negative value is used for
a non-system call. This is the case on x86 as well when
arch-independent code is involved. The arch-independent API is
defined/documented (but not *implemented*!) in
<asm-generic/syscall.h>.

This is an ABI change, but is in fact a revert to the original x86-64
ABI. The original assembly entry code would zero-extend the system
call number; this patch uses sign extend to be explicit that this is
treated as a signed number (although in practice it makes no
difference, of course) and to avoid people getting the idea of
"optimizing" it, as has happened on at least two(!) separate
occasions.

Do not store the extended value into regs->orig_ax, however: on
x86-64, the ABI is that the callee is responsible for extending
parameters, so only examining the lower 32 bits is fully consistent
with any "int" argument to any system call, e.g. regs->di for
write(2). The full value of %rax on entry to the kernel is thus still
available.

Signed-off-by: H. Peter Anvin (Intel) <hpa@xxxxxxxxx>
---
arch/x86/entry/entry_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1d9db15fdc69..85f04ea0e368 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -108,7 +108,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)

/* IRQs are off. */
movq %rsp, %rdi
- movq %rax, %rsi
+ movslq %eax, %rsi
call do_syscall_64 /* returns with IRQs disabled */

/*
--
2.31.1