Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

From: Andy Lutomirski
Date: Tue Feb 06 2018 - 18:06:11 EST


On Tue, Feb 6, 2018 at 10:48 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Feb 6, 2018 at 1:32 PM, Dominik Brodowski
> <linux@xxxxxxxxxxxxxxxxxxxx> wrote:
>> Same as is done for syscalls, interleave XOR with PUSH or MOV
>> instructions for exceptions/interrupts, in order to minimize
>> the cost of the additional instructions required for register
>> clearing.
>
> Side note: I would _really_ like to see
>
> (a) SAVE_{C,EXTRA}_REGS go away entirely, to be replaced by just SAVE_REGS.
>
> We never use them independently of each other any more.
>
> (b) Get rid of ALLOC_PT_GPREGS_ON_STACK entirely, and make SAVE_REGS
> use pushq's instead of movs.

Agreed.

However, bit fat NAK to the patch as it. There's no way I'm okay with
a macro called SAVE_C_REGS that actually saves *and clears* C regs.
Call it SAVE_AND_CLEAR_C_REGS.

>
> Doing (a) should be completely trivial.
>
> Doing (b) looks like it needs _some_ care, because
> ALLOC_PT_GPREGS_ON_STACK is not always done just before the SAVE_REGS,
> the error entry code does it in in the early entry code. But honestly,
> that seems mainly so that it can do
>
> testb $3, CS(%rsp) /* If coming from
> userspace, switch stacks */
>
> before registers are saved, yet use the same CS offset as if they had
> already been saved. So that _one_ stack offset in the 'idtentry' macro
> would need to be fixed up.
>
> There might be others that I don't see from just eyeballing, so it
> does need some care, but wouldn't it be nice if *all* the entry code
> could just use the same pushq sequences, and then put the xor's in
> there?
>
> The reason for that complexity is purely the system call fastpath case
> that no longer exists, I think.
>
> Am I missing something?

I don't think so.

idtentry could use some massive cleanups IMO. At some point I'll find
time to do it. We've added features to it piecemeal over time, and
the net result including the stack switch for PTI is a mess.

>
> Linus