Re: [PATCH 02/16] x86/entry/32: Enter the kernel via trampoline stack

From: Andy Lutomirski
Date: Fri Jan 19 2018 - 11:31:12 EST


On Fri, Jan 19, 2018 at 1:55 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
> Hey Andy,
>
> On Wed, Jan 17, 2018 at 10:10:23AM -0800, Andy Lutomirski wrote:
>> On Wed, Jan 17, 2018 at 1:18 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
>
>> > Just read up on vm86 mode control transfers and the stack layout then.
>> > Looks like I need to check for eflags.vm=1 and copy four more registers
>> > from/to the entry stack. Thanks for pointing that out.
>>
>> You could just copy those slots unconditionally. After all, you're
>> slowing down entries by an epic amount due to writing CR3 on with PCID
>> off, so four words copied should be entirely lost in the noise. OTOH,
>> checking for VM86 mode is just a single bt against EFLAGS.
>>
>> With the modern (rewritten a year or two ago by Brian Gerst) vm86
>> code, all the slots (those actually in pt_regs) are in the same
>> location regardless of whether we're in VM86 mode or not, but we're
>> still fiddling with the bottom of the stack. Since you're controlling
>> the switch to the kernel thread stack, you can easily just write the
>> frame to the correct location, so you should not need to context
>> switch sp1 -- you can do it sanely and leave sp1 as the actual bottom
>> of the kernel stack no matter what. In fact, you could probably avoid
>> context switching sp0, either, which would be a nice cleanup.
>
> I am not sure what you mean by "not context switching sp0/sp1" ...

You're supposed to read what I meant, not what I said...

I meant that we could have sp0 have a genuinely constant value per
cpu. That means that the entry trampoline ends up with RIP, etc in a
different place depending on whether VM was in use, but the entry
trampoline code should be able to handle that. sp1 would have a value
that varies by task, but it could just point to the top of the stack
instead of being changed depending on whether VM is in use. Instead,
the entry trampoline would offset the registers as needed to keep
pt_regs in the right place.

I think you already figured all of that out, though :)