Re: [PATCH] KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure

From: Tom Lendacky
Date: Thu May 20 2021 - 16:57:38 EST


On 5/20/21 2:16 PM, Sean Christopherson wrote:
> On Mon, May 17, 2021, Tom Lendacky wrote:
>> On 5/14/21 6:06 PM, Peter Gonda wrote:
>>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
>>>>
>>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
>>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued
>>>> from userspace, even though userspace (likely) can't update the GHCB,
>>>> don't allow userspace to be able to kill the guest.
>>>>
>>>> Return a #GP request through the GHCB when validation fails, rather than
>>>> terminating the guest.
>>>
>>> Is this a gap in the spec? I don't see anything that details what
>>> should happen if the correct fields for NAE are not set in the first
>>> couple paragraphs of section 4 'GHCB Protocol'.
>>
>> No, I don't think the spec needs to spell out everything like this. The
>> hypervisor is free to determine its course of action in this case.
>
> The hypervisor can decide whether to inject/return an error or kill the guest,
> but what errors can be returned and how they're returned absolutely needs to be
> ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec
> is the logical place to define said ABI.

For now, that is all we have for versions 1 and 2 of the spec. We can
certainly extend it in future versions if that is desired.

I would suggest starting a thread on what we would like to see in the next
version of the GHCB spec on the amd-sev-snp mailing list:

amd-sev-snp@xxxxxxxxxxxxxx

>
> For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is
> completely nonsensical. As is, a Linux guest appears to blindly forward the #GP,
> which means if something does go awry KVM has just made debugging the guest that
> much harder, e.g. imagine the confusion that will ensue if the end result is a
> SIGBUS to userspace on CPUID.

I see the point you're making, but I would also say that we probably
wouldn't even boot successfully if the kernel can't handle, e.g., a CPUID
#VC properly. A lot of what could go wrong with required inputs, not the
values, but the required state being communicated, should have already
been ironed out during development of whichever OS is providing the SEV-ES
support.

>
> There needs to be an explicit error code for "you gave me bad data", otherwise
> we're signing ourselves up for future pain.

I'll make note of that for the next update to the spec and we can work on
it further during the spec review.

Thanks,
Tom

>
>> I suppose the spec could suggest a course of action, but I don't think the
>> spec should require a specific course of action.
>>
>> Thanks,
>> Tom
>>
>>>