Re: [PATCH Part1 RFC v3 11/22] x86/sev: Add helper for validating pages in early enc attribute changes

From: Brijesh Singh
Date: Wed Jun 16 2021 - 08:49:38 EST



On 6/16/21 7:03 AM, Borislav Petkov wrote:
> On Wed, Jun 16, 2021 at 06:00:09AM -0500, Brijesh Singh wrote:
>> I am trying to be consistent with previous VMGEXIT implementations. If
>> the command itself failed then use the command specific error code to
>> tell hypervisor why we terminated but if the hypervisor violated the
>> GHCB specification then use the "general request termination".
> I feel like we're running in circles here: I ask about debuggability
> and telling the user what exactly failed and you're giving me some
> explanation about what the error codes mean. I can see what they mean.
>
> So let me try again:
>
> Imagine you're a guest owner and you haven't written the SNP code and
> you don't know how it works.
>
> You start a guest in the public cloud and it fails because the
> hypervisor violates the GHCB protocol and all that guest prints before
> it dies is
>
> "general request termination"


The GHCB specification does not define a unique error code for every
possible condition. Now that we have reserved reason set 1 for the
Linux-specific error code, we could add a new error code to cover the
cases for the protocol violation. I was highlighting that we should not
overload the meaning of GHCB_TERM_PSC. In my mind, the GHCB_TERM_PSC
error code is used when the guest sees that the hypervisor failed to
change the state . The failure maybe because the guest provided a bogus
GPA or invalid operation code, or RMPUPDATE failure or HV does not
support SNP feature etc etc. But in this case, the failure was due to
the protocol error, and IMO we should not use the GHCB_TERM_PSC.
Additionally, we should also update CPUID and other VMGEXITs to use the
new error code instead of "general request termination" so that its
consistent.


If you still think that GHCB_TERM_PSC is valid here, then I am okay with it.

-Brijesh