Re: Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)

From: Peter Zijlstra
Date: Tue Jun 23 2020 - 06:46:42 EST


On Tue, Jun 23, 2020 at 11:45:19AM +0200, Joerg Roedel wrote:
> Hi Andy,
>
> On Mon, Apr 27, 2020 at 10:37:41AM -0700, Andy Lutomirski wrote:
> > 1. Use IST for #VC and deal with all the mess that entails.
>
> With the removal of IST shifting I wonder what you would suggest on how
> to best implement an NMI-safe IST handler with nesting support.
>
> My current plan is to implement an IST handler which switches itself off
> the IST stack as soon as possible, freeing it for re-use.
>
> The flow would be roughly like this upon entering the handler;
>
> build_pt_regs();
>
> RSP = pt_regs->sp;
>
> if (RSP in VC_IST_stack)
> error("unallowed nesting")
>
> if (RSP in current_kernel_stack)
> RSP = round_down_to_8(RSP)
> else
> RSP = current_top_of_stack() // non-ist kernel stack
>
> copy_pt_regs(pt_regs, RSP);
> switch_stack_to(RSP);
>
> To make this NMI safe, the NMI handler needs some logic too. Upon
> entering NMI, it needs to check the return RSP, and if it is in the #VC
> IST stack, it must do the above flow by itself and update the return RSP
> and RIP. It needs to take into account the case when PT_REGS is not
> fully populated on the return side.
>
> Alternativly the NMI handler could safe/restore the contents of the #VC
> IST stack or just switch to a special #VC-in-NMI IST stack.
>
> All in all it could get complicated, and imho shift_ist would have been
> simpler, but who am I anyway...
>
> Or maybe you have a better idea how to implement this, so I'd like to
> hear your opinion first before I spend too many days implementing
> something.

OK, excuse my ignorance, but I'm not seeing how that IST shifting
nonsense would've helped in the first place.

If I understand correctly the problem is:

<#VC>
shift IST
<NMI>
... does stuff
<#VC> # again, safe because the shift

But what happens if you get the NMI before your IST adjustment?

<#VC>
<NMI>
... does stuff
<#VC> # again, happily wrecks your earlier #VC
shift IST # whoopsy, too late

Either way around we get to fix this up in NMI (and any other IST
exception that can happen while in #VC, hello #MC). And more complexity
there is the very last thing we need :-(

There's no way you can fix up the IDT without getting an NMI first.

This entire exception model is fundamentally buggered :-/