Re: Proposal for finishing the 64-bit x86 syscall cleanup

From: Brian Gerst
Date: Wed Aug 26 2015 - 01:21:24 EST

>>> Thing 2: vdso compilation with binutils that doesn't support .cfi directives
>>> Userspace debuggers really like having the vdso properly
>>> CFI-annotated, and the 32-bit fast syscall entries are annotatied
>>> manually in hexidecimal. AFAIK Jan Beulich is the only person who
>>> understands it.
>>> I want to be able to change the entries a little bit to clean them up
>>> (and possibly rework the SYSCALL32 and SYSENTER register tricks, which
>>> currently suck), but it's really, really messy right now because of
>>> the hex CFI stuff. Could we just drop the CFI annotations if the
>>> binutils version is too old or even just require new enough binutils
>>> to build 32-bit and compat kernels?
>> One thing I want to do is rework the 32-bit VDSO into a single image,
>> using alternatives to handle the selection of entry method. The
>> open-coded CFI crap has made that near impossible to do.
> Yes please!
> But please don't change the actual instruction ordering at all yet,
> since the SYSCALL case seems to be buggy right now.
> (If you want to be really fancy, don't use alternatives. Instead
> teach vdso2c to annotate the actual dynamic table function pointers so
> we can rewrite the pointers at boot time. That will save a cycle or
> two.)

The easiest way to select the right entry code is by changing the ELF
AUX vector. That gets the normal usage, but there are two additional
cases that need addressing.

1) Some code could possibly lookup the __kernel_vsyscall symbol
directly and call it, but that's non-standard. If there is code out
there that does this, we could update the ELF symbol table to point
__kernel_vsyscall to the chosen entry point, or just remove the symbol
and let the caller fall back to INT80.

2) The sigreturn trampolines. These are tricky because the sigreturn
syscalls implicitly uses regs->sp to find the signal frame. That
interacts badly with the SYSENTER/SYSCALL entries, which save
registers on the stack. It currently uses a bare SYSCALL instruction
(no pushes to the stack), but falls back to INT80 for SYSENTER. One
option is to create new syscalls that takes a pointer to the signal
frame as arg1.

Brian Gerst
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at