Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

From: Andy Lutomirski
Date: Thu Nov 29 2018 - 14:27:17 EST


On Thu, Nov 29, 2018 at 11:08 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 29, 2018 at 10:58 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > In contrast, if the call was wrapped in an inline asm, we'd *know* the
> > compiler couldn't turn a "call wrapper(%rip)" into anything else.
>
> Actually, I think I have a better model - if the caller is done with inline asm.
>
> What you can do then is basically add a single-byte prefix to the
> "call" instruction that does nothing (say, cs override), and then
> replace *that* with a 'int3' instruction.
>
> Boom. Done.
>
> Now, the "int3" handler can just update the instruction in-place, but
> leave the "int3" in place, and then return to the next instruction
> byte (which is just the normal branch instruction without the prefix
> byte).
>
> The cross-CPU case continues to work, because the 'int3' remains in
> place until after the IPI.

Hmm, cute. But then the calls are in inline asm, which results in
giant turds like we have for the pvop vcalls. And, if they start
being used more generally, we potentially have ABI issues where the
calling convention isn't quite what the asm expects, and we explode.

I propose a different solution:

As in this patch set, we have a direct and an indirect version. The
indirect version remains exactly the same as in this patch set. The
direct version just only does the patching when all seems well: the
call instruction needs to be 0xe8, and we only do it when the thing
doesn't cross a cache line. Does that work? In the rare case where
the compiler generates something other than 0xe8 or crosses a cache
line, then the thing just remains as a call to the out of line jmp
trampoline. Does that seem reasonable? It's a very minor change to
the patch set.

Alternatively, we could actually emulate call instructions like this:

void __noreturn jump_to_kernel_pt_regs(struct pt_regs *regs, ...)
{
struct pt_regs ptregs_copy = *regs;
barrier();
*(unsigned long *)(regs->sp - 8) = whatever; /* may clobber old
regs, but so what? */
asm volatile ("jmp return_to_alternate_ptregs");
}

where return_to_alternate_ptregs points rsp to the ptregs and goes
through the normal return path. It's ugly, but we could have a test
case for it, and it should work fine.