Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL

From: Ashish Kalra
Date: Fri Aug 20 2021 - 09:32:38 EST


On Thu, Aug 19, 2021 at 11:15:26PM +0000, Sean Christopherson wrote:
> On Thu, Aug 19, 2021, Kalra, Ashish wrote:
> >
> > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@xxxxxxx> wrote:
> > > I think it makes more sense to stick to the original approach/patch, i.e.,
> > > introducing a new private hypercall interface like kvm_sev_hypercall3() and
> > > let early paravirtualized kernel code invoke this private hypercall
> > > interface wherever required.
>
> I don't like the idea of duplicating code just because the problem is tricky to
> solve. Right now it's just one function, but it could balloon to multiple in
> the future. Plus there's always the possibility of a new, pre-alternatives
> kvm_hypercall() being added in generic code, at which point using an SEV-specific
> variant gets even uglier.
>

Also to highlight the need to support this interface, capturing the flow
of apply_alternatives() as part of this thread:

setup_arch() call init_hypervisor_platform() which detects the
hypervisor platform the kernel is running under and then the hypervisor
specific initialization code can make early hypercalls. For example, KVM
specific initialization in case of SEV will try to mark the
"__bss_decrypted" section's encryption state via early page encryption
status hypercalls.

Now, apply_alternatives() is called much later when setup_arch() calls
check_bugs(), so we do need some kind of an early, pre-alternatives
hypercall interface.

Other cases of pre-alternatives hypercalls include marking per-cpu GHCB
pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and
kvm_apic_eoi as decrypted for SEV generally.

Actually using this kvm_sev_hypercall3() function may be abstracted
quite nicely. All these early hypercalls are made through
early_set_memory_XX() interfaces, which in turn invoke pv_ops.

Now, pv_ops can have this SEV/TDX specific abstractions.

Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup
to kvm_sev_hypercall3() in case of SEV.

Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can
be setup to a TDX specific callback.

Therefore, this early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
is a generic interface and can easily have SEV, TDX and any other future platform
specific abstractions added to it.

Thanks,
Ashish

> > > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using
> > > hacks as below.
> > >
> > > TDX code can introduce similar private hypercall interface for their early
> > > para virtualized kernel code if required.
> >
> > Actually, if we are using this kvm_sev_hypercall3() and not modifying
> > KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not
> > need any new interface. Only early AMD/SEV specific code will use this
> > kvm_sev_hypercall3() interface. TDX code will always work with
> > KVM_HYPERCALL().
>
> Even if VMCALL is the default, i.e. not patched in, VMCALL it will #VE on TDX.
> In other words, VMCALL isn't really any better than VMMCALL, TDX will need to do
> something clever either way.