Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed

From: Tom Lendacky
Date: Thu May 13 2021 - 09:50:21 EST

Next message: Ian Kent: "Re: [PATCH v4 0/5] kernfs: proposed locking and concurrency improvement"
Previous message: Gong Ruiqi: "Re: [PATCH -next] drivers/base/node.c: make CACHE_ATTR define static DEVICE_ATTR_RO"
In reply to: Paolo Bonzini: "Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed"
Next in thread: Ashish Kalra: "Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5/12/21 10:51 AM, Sean Christopherson wrote:
> On Wed, May 12, 2021, Borislav Petkov wrote:
>> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
>>> +static inline void notify_page_enc_status_changed(unsigned long pfn,
>>> + int npages, bool enc)
>>> +{
>>> + PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
>>> +}
>>
>> Now the question is whether something like that is needed for TDX, and,
>> if so, could it be shared by both.
>
> Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime. Ditto
> for SEV-SNP. I wanted to squish everything into a single common hypercall, but
> that didn't pan out.
>
> The problem is that both TDX and SNP define their own versions of this so that
> any guest kernel that complies with the TDX|SNP specification will run cleanly
> on a hypervisor that also complies with the spec. This KVM-specific hook doesn't
> meet those requires because non-Linux guest support will be sketchy at best, and
> non-KVM hypervisor support will be non-existent.
>
> The best we can do, short of refusing to support TDX or SNP, is to make this
> KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
> control logic is identical. The mechanics of actually invoking the hypercall
> will differ, but if done right, everything else should be reusable without
> modification.
>
> I had an in-depth analysis of this, but it was all off-list. Pasted below.
>
> TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
> from time zero. SNP could technically do the same (with a revised GHCB spec),
> but it'd be butt ugly. And of course trying to go that route for either TDX or
> SNP would run into the problem of having to coordinate the ABI for the "legacy"
> hypercall across all guests and hosts. So yeah, trying to remove any of the
> three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.
>
> That being said, I do think we can reuse the KVM specific hypercall for TDX and
> SNP. Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
> vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
> KVM enlightened from switching to the KVM specific hypercall once it can do so.
> More thoughts later on.
>
> > I guess a common structure could be used along the lines of what is in the
> > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
> > only ever really do a single page range at a time (through
> > set_memory_encrypted() and set_memory_decrypted()). The reason for the
> > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
> > page ranges in advance. Will TDX need to do something similar?
>
> Yes, TDX needs the exact same thing. All three (SEV, SNP, and TDX) have more or
> less the exact same hook in the guest (Linux, obviously) kernel.
>
> > If so, the only real common piece in KVM is a function to track what pages
> > are shared vs private, which would only require a simple interface.
>
> It's not just KVM, it's also the relevant code in the guest kernel(s) and other
> hypervisors. And the MMU side of KVM will likely be able to share code, e.g. to
> act on the page size hint.
>
> > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
> > range is really all that is needed, but it must be common across
> > hypervisors. I think that was one Sean's points originally, we don't want
> > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.
>
> For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
> to make it a superset of the SNP and TDX protocols so that it _can_ be used in
> lieu of the SNP/TDX specific protocol. I don't know for sure whether or not
> that will actually yield better code and/or performance, but it costs us almost
> nothing and at least gives us the option of further optimizing the Linux+KVM
> combination.

Right, for SEV-SNP, as long as we know that the KVM interface is
available, it could be used. But we would have to fall back to the GHCB
specification if it could not be determined.

>
> It probably shouldn't be a strict superset, as in practice I don't think SNP
> approach of having individual entries when batching multiple pages will yield
> the best performance. E.g. the vast majority (maybe all?) of conversions for a
> Linux guest will be physically contiguous and will have the same preferred page
> size, at which point there will be less overhead if the guest specifies a
> massive range as opposed to having to santize and fill a large buffer.

Originally, the plan was to use ranges, but based on feedback during the
GHCB spec review it was updated to its current definition.

A concern from other hypervisor vendors was to be able to return back to
the guest in the middle of the hypercall, e.g. to deliver an interrupt or
such, and then allow the guest to resume the hypercall where it left off.
Not sure if you would want to build that into this hypercall, since
depending on the range, the operation could take a while.

Thanks,
Tom

>
> TL;DR: I think the KVM hypercall should be something like this, so that it can
> be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
> performance enhancements or something.
>
> 8. KVM_HC_MAP_GPA_RANGE
> -----------------------
> :Architecture: x86
> :Status: active
> :Purpose: Request KVM to map a GPA range with the specified attributes.
>
> a0: the guest physical address of the start page
> a1: the number of (4kb) pages (must be contiguous in GPA space)
> a2: attributes
>
> where 'attributes' could be something like:
>
> bits 3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
> bit 4 - plaintext = 0, encrypted = 1
> bits 63:5 - reserved (must be zero)
>

Next message: Ian Kent: "Re: [PATCH v4 0/5] kernfs: proposed locking and concurrency improvement"
Previous message: Gong Ruiqi: "Re: [PATCH -next] drivers/base/node.c: make CACHE_ATTR define static DEVICE_ATTR_RO"
In reply to: Paolo Bonzini: "Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed"
Next in thread: Ashish Kalra: "Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]