Re: [PATCH] x86: execve and sigreturn syscalls must return via iret

From: Andy Lutomirski
Date: Mon Mar 23 2015 - 14:42:39 EST


On Mon, Mar 23, 2015 at 8:24 AM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
> On Mon, Mar 23, 2015 at 3:56 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>
>> * Brian Gerst <brgerst@xxxxxxxxx> wrote:
>>
>>> Both the execve and sigreturn family of syscalls have the ability to change
>>> registers in ways that may not be compatabile with the syscall path they
>>> were called from. In particular, sysret and sysexit can't handle non-default
>>> %cs and %ss, and some bits in eflags. These syscalls have stubs that are
>>> hardcoded to jump to the iret path, and not return to the original syscall
>>> path. Commit 76f5df43cab5e765c0bd42289103e8f625813ae1 (Always allocate a
>>> complete "struct pt_regs" on the kernel stack) recently changed this for
>>> some 32-bit compat syscalls, but introduced a bug where execve from a 32-bit
>>> program to a 64-bit program would fail because it still returned via sysretl.
>>> This caused Wine to fail when built for both 32-bit and 64-bit.
>>>
>>> This patch sets TIF_NOTIFY_RESUME for execve and sigreturn so that the iret
>>> path is always taken on exit to userspace.
>>>
>>> Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>> Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
>>> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>> Cc: Borislav Petkov <bp@xxxxxxxxx>
>>> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
>>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> ---
>>> arch/x86/ia32/ia32_signal.c | 2 ++
>>> arch/x86/include/asm/ptrace.h | 2 +-
>>> arch/x86/include/asm/thread_info.h | 7 +++++++
>>> arch/x86/kernel/process_32.c | 6 +-----
>>> arch/x86/kernel/process_64.c | 1 +
>>> arch/x86/kernel/signal.c | 2 ++
>>> 6 files changed, 14 insertions(+), 6 deletions(-)
>>
>> Applied the fix to tip:x86/asm, thanks Brian!
>>
>>> +
>>> +/*
>>> + * force syscall return via iret by making it look as if there was
>>> + * some work pending.
>>> +*/
>>> +#define force_iret() set_thread_flag(TIF_NOTIFY_RESUME)
>>
>> I extended this comment to:
>>
>> /*
>> * Force syscall return via IRET by making it look as if there was
>> * some work pending. IRET is our most capable (but slowest) syscall
>> * return path, which is able to restore modified SS, CS and certain
>> * EFLAGS values that other (fast) syscall return instructions
>> * are not able to restore properly.
>> */
>> #define force_iret() set_thread_flag(TIF_NOTIFY_RESUME)
>>
>> Just to preserve the underlying reason for force_iret() for the future
>> and such.
>>
>> Btw., it might be a worthwile optimization to detect non-standard SS,
>> CS and EFLAGS values and only force_iret() in that case, that will
>> speed up 99.9999% of execve() and sigreturn() syscalls and only force
>> the 'weird' process startup modes into the slow return path.
>
> sysret/sysexit also can't restore rcx and r11/rdx. This would not
> work for execve, since it sets those registers to zero. It could
> possibly work for sigreturn if the signal interrupted a syscall. We
> already have the opportunistic sysret code for 64-bit returns.

I'd like to extend opportunistic sysret to 32-bit, as either only
opportunistic sysretl or perhaps opportunistic sysexitl *and* sysretl
depending on regs.

Why only sysretl as a possibility? I think that Intel CPUs allow
sysretl despite not having the matching syscall instruction. If we
could make everything work without using sysexit, the code would be
simplified.

We could add helpers like opportunistic_clobber_rcx_r11(regs) and
opportunistic_clobber_ecx_edx(regs) that would set us use it
reasonably cleanly.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/