RE: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl

From: Kalra, Ashish
Date: Fri Apr 10 2020 - 17:10:28 EST


[AMD Official Use Only - Internal Distribution Only]

Hello Steve,

-----Original Message-----
From: Steve Rutherford <srutherford@xxxxxxxxxx>
Sent: Friday, April 10, 2020 3:19 PM
To: Kalra, Ashish <Ashish.Kalra@xxxxxxx>
Cc: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; H. Peter Anvin <hpa@xxxxxxxxx>; Joerg Roedel <joro@xxxxxxxxxx>; Borislav Petkov <bp@xxxxxxx>; Lendacky, Thomas <Thomas.Lendacky@xxxxxxx>; X86 ML <x86@xxxxxxxxxx>; KVM list <kvm@xxxxxxxxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; David Rientjes <rientjes@xxxxxxxxxx>; Andy Lutomirski <luto@xxxxxxxxxx>; Singh, Brijesh <brijesh.singh@xxxxxxx>
Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl

On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@xxxxxxxxxx> wrote:
>
> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
> <srutherford@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > > <krish.sadhukhan@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > > >>> From: Ashish Kalra <ashish.kalra@xxxxxxx>
> > > > > > > >>>
> > > > > > > >>> This ioctl can be used by the application to reset the
> > > > > > > >>> page encryption bitmap managed by the KVM driver. A
> > > > > > > >>> typical usage for this ioctl is on VM reboot, on
> > > > > > > >>> reboot, we must reinitialize the bitmap.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx>
> > > > > > > >>> ---
> > > > > > > >>> Documentation/virt/kvm/api.rst | 13 +++++++++++++
> > > > > > > >>> arch/x86/include/asm/kvm_host.h | 1 +
> > > > > > > >>> arch/x86/kvm/svm.c | 16 ++++++++++++++++
> > > > > > > >>> arch/x86/kvm/x86.c | 6 ++++++
> > > > > > > >>> include/uapi/linux/kvm.h | 1 +
> > > > > > > >>> 5 files changed, 37 insertions(+)
> > > > > > > >>>
> > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst
> > > > > > > >>> b/Documentation/virt/kvm/api.rst index
> > > > > > > >>> 4d1004a154f6..a11326ccc51d 100644
> > > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > > >>> bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > > >>> bitmap for an incoming guest.
> > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > > >>> +-----------------------------------------
> > > > > > > >>> +
> > > > > > > >>> +:Capability: basic
> > > > > > > >>> +:Architectures: x86
> > > > > > > >>> +:Type: vm ioctl
> > > > > > > >>> +:Parameters: none
> > > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > > >>> +
> > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the
> > > > > > > >>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > > >>> +
> > > > > > > >>> +
> > > > > > > >>> 5. The kvm_run structure
> > > > > > > >>> ======================== diff --git
> > > > > > > >>> a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> b/arch/x86/include/asm/kvm_host.h index
> > > > > > > >>> d30f770aaaea..a96ef6338cd2 100644
> > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > > >>> struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>> int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > > >>> struct kvm_page_enc_bitmap
> > > > > > > >>> *bmap);
> > > > > > > >>> + int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > > >>> };
> > > > > > > >>> struct kvm_arch_async_pf { diff --git
> > > > > > > >>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index
> > > > > > > >>> 313343a43045..c99b0207a443 100644
> > > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > > >>> return ret;
> > > > > > > >>> }
> > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > > >>> +{
> > > > > > > >>> + struct kvm_sev_info *sev =
> > > > > > > >>> +&to_kvm_svm(kvm)->sev_info;
> > > > > > > >>> +
> > > > > > > >>> + if (!sev_guest(kvm))
> > > > > > > >>> + return -ENOTTY;
> > > > > > > >>> +
> > > > > > > >>> + mutex_lock(&kvm->lock);
> > > > > > > >>> + /* by default all pages should be marked encrypted */
> > > > > > > >>> + if (sev->page_enc_bmap_size)
> > > > > > > >>> + bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > > >>> + mutex_unlock(&kvm->lock);
> > > > > > > >>> + return 0;
> > > > > > > >>> +}
> > > > > > > >>> +
> > > > > > > >>> static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > > >>> {
> > > > > > > >>> struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@
> > > > > > > >>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > > >>> .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > > >>> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > > >>> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > > >>> + .reset_page_enc_bitmap =
> > > > > > > >>> + svm_reset_page_enc_bitmap,
> > > > > > > >>
> > > > > > > >> We don't need to initialize the intel ops to NULL ?
> > > > > > > >> It's not initialized in the previous patch either.
> > > > > > > >>
> > > > > > > >>> };
> > > > > > > > This struct is declared as "static storage", so won't
> > > > > > > > the non-initialized members be 0 ?
> > > > > > >
> > > > > > >
> > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is
> > > > > > > explicitly initialized. We should maintain the convention, perhaps.
> > > > > > >
> > > > > > > >
> > > > > > > >>> static int __init svm_init(void) diff --git
> > > > > > > >>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > > > > > > >>> 05e953b2ec61..2127ed937f53 100644
> > > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > > >>> r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > > >>> break;
> > > > > > > >>> }
> > > > > > > >>> + case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > > >>> + r = -ENOTTY;
> > > > > > > >>> + if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > > >>> + r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > > >>> + break;
> > > > > > > >>> + }
> > > > > > > >>> default:
> > > > > > > >>> r = -ENOTTY;
> > > > > > > >>> }
> > > > > > > >>> diff --git a/include/uapi/linux/kvm.h
> > > > > > > >>> b/include/uapi/linux/kvm.h index
> > > > > > > >>> b4b01d47e568..0884a581fc37 100644
> > > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > > >>> #define KVM_GET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > > >>> #define KVM_SET_PAGE_ENC_BITMAP _IOW(KVMIO, 0xc6,
> > > > > > > >>> struct kvm_page_enc_bitmap)
> > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET _IO(KVMIO, 0xc7)
> > > > > > > >>> /* Secure Encrypted Virtualization command */
> > > > > > > >>> enum sev_cmd_id {
> > > > > > > >> Reviewed-by: Krish Sadhukhan
> > > > > > > >> <krish.sadhukhan@xxxxxxxxxx>
> > > > > >
> > > > > >
> > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you
> > > > > > have to copy the new value down and do a bit more work, but
> > > > > > I don't think resetting the bitmap is going to be the
> > > > > > bottleneck on reboot. Seems excessive to add another ioctl for this.
> > > > >
> > > > > The set ioctl is generally available/provided for the incoming
> > > > > VM to setup the page encryption bitmap, this reset ioctl is
> > > > > meant for the source VM as a simple interface to reset the whole page encryption bitmap.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > >
> > > > Hey Ashish,
> > > >
> > > > These seem very overlapping. I think this API should be refactored a bit.
> > > >
> > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > > hypercall (and related feature bit) is offered to the VM, and
> > > > also the size of the buffer.
> > >
> > > If you look at patch 13/14, i have added a new kvm para feature
> > > called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host
> > > support for SEV Live Migration and a new Custom MSR which the
> > > guest does a wrmsr to enable the Live Migration feature, so this
> > > is like the enable cap support.
> > >
> > > There are further extensions to this support i am adding, so patch
> > > 13/14 of this patch-set is still being enhanced and will have full
> > > support when i repost next.
> > >
> > > > 2) Use set for manipulating values in the bitmap, including
> > > > resetting the bitmap. Set the bitmap pointer to null if you want
> > > > to reset to all 0xFFs. When the bitmap pointer is set, it should
> > > > set the values to exactly what is pointed at, instead of only
> > > > clearing bits, as is done currently.
> > >
> > > As i mentioned in my earlier email, the set api is supposed to be
> > > for the incoming VM, but if you really need to use it for the
> > > outgoing VM then it can be modified.
> > >
> > > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > > require alignment of the base GFN to a multiple of 8 (but the
> > > > number of pages could be whatever), so you can just use a
> > > > memcpy. Optionally, you may want some way to tell userspace the
> > > > size of the existing buffer, so it can ensure that it can ask
> > > > for the entire buffer without having to track the size in
> > > > usermode (not strictly necessary, but nice to have since it
> > > > ensures that there is only one place that has to manage this value).
> > > >
> > > > If you want to expand or contract the bitmap, you can use enable
> > > > cap to adjust the size.
> > >
> > > As being discussed on the earlier mail thread, we are doing this
> > > dynamically now by computing the guest RAM size when the
> > > set_user_memory_region ioctl is invoked. I believe that should
> > > handle the hot-plug and hot-unplug events too, as any hot memory
> > > updates will need KVM memslots to be updated.
> > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> > to be able to decide not to allocate, but this should be workable.
> > >
> > > > If you don't want to offer the hypercall to the guest, don't
> > > > call the enable cap.
> > > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > > scarce. It also gives userspace fine grained control over the
> > > > buffer, so it can support both hot-plug and hot-unplug (or at
> > > > the very least it is not obviously incompatible with those). It
> > > > also gives userspace control over whether or not the feature is
> > > > offered. The hypercall isn't free, and being able to tell guests
> > > > to not call when the host wasn't going to migrate it anyway will be useful.
> > > >
> > >
> > > As i mentioned above, now the host indicates if it supports the
> > > Live Migration feature and the feature and the hypercall are only
> > > enabled on the host when the guest checks for this support and
> > > does a wrmsr() to enable the feature. Also the guest will not make
> > > the hypercall if the host does not indicate support for it.
> > If my read of those patches was correct, the host will always
> > advertise support for the hypercall. And the only bit controlling
> > whether or not the hypercall is advertised is essentially the kernel
> > version. You need to rollout a new kernel to disable the hypercall.
>
> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and
> forth for CPUID with usermode. My point about informing the guest
> kernel is clearly moot. The host still needs the ability to prevent
> allocations, but that is more minor. Maybe use a flag on the memslots
> directly?
> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.

Currently, I am still using the approach of a "unified" page encryption bitmap instead of a
bitmap per memslot, with the main change being that the resizing is only done whenever
there are any updates in memslots, when memslots are updated using the
kvm_arch_commit_memory_region() interface.

Thanks,
Ashish