Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

From: Brian Gerst
Date: Tue Dec 08 2015 - 08:07:57 EST

On Mon, Dec 7, 2015 at 8:12 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst <brgerst@xxxxxxxxx> wrote:
>>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>>> 64-bit syscalls currently have an optimization in which they are
>>>> called with partial pt_regs. A small handful require full pt_regs.
>>>> In the 32-bit and compat cases, I cleaned this up by forcing full
>>>> pt_regs for all syscalls. The performance hit doesn't really matter.
>>>> I want to clean up the 64-bit case as well, but I don't want to hurt
>>>> fast path performance. To do that, I want to force the syscalls
>>>> that use pt_regs onto the slow path. This will enable us to make
>>>> slow path syscalls be real ABI-compliant C functions.
>>>> Use the new syscall entry qualification machinery for this.
>>>> stub_clone is now stub_clone/ptregs.
>>>> The next patch will eliminate the stubs, and we'll just have
>>>> sys_clone/ptregs.
>> [Resend after gmail web interface fail]
>> I've got an idea on how to do this without the duplicate syscall table.
>> ptregs_foo:
>> leaq sys_foo(%rip), %rax
>> jmp stub_ptregs_64
>> stub_ptregs_64:
>> testl $TS_EXTRAREGS, <current->ti_status>
>> jnz 1f
>> call *%rax
>> ret
>> 1:
>> call *%rax
>> ret
>> This makes sure that the extra regs don't get saved a second time if
>> coming in from the slow path, but preserves the fast path if not
>> tracing.
> I think there's value in having the entries in the table be genuine C
> ABI-compliant function pointers. In your example, it only barely
> works -- you can call them from C only if you have TS_EXTRAREGS set
> appropriately -- -otherwise you crash and burn. That will break the
> rest of the series.

I'm working on a full patch. It will set the flag (renamed
TS_SLOWPATH) in do_syscall_64(), which is the only place these
functions can get called from C code. Your changes already have it
set up so that the slow path saved these registers before calling any
C code. Where else do you expect them to be called from?

> We could adjust it a bit and check whether we're in C land (by
> checking rsp for ts) and jump into the slow path if we aren't, but I'm
> not sure this is a huge win. It does save some rodata space by
> avoiding duplicating the table.

The syscall table is huge. 545*8 bytes, over a full page.
Duplicating it for just a few different entries is wasteful.

Brian Gerst
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at