Re: [PATCH v2 07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY

From: Sean Christopherson
Date: Tue Apr 16 2024 - 19:00:58 EST


On Tue, Apr 16, 2024, Paolo Bonzini wrote:
> On Mon, Apr 15, 2024 at 11:17 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > The first question to answer is, do we want to return an error or "silently"
> > install mappings for !SMM, !guest_mode. And so this option becomes relevant only
> > _if_ we want to unconditionally install mappings for the 'base" mode.
> >
> > > > - Return error on guest mode or SMM mode: Without this patch.
> > > > Pros: No additional patch.
> > > > Cons: Difficult to use.
> > >
> > > Hmm... For the non-TDX use cases this is just an optimization, right? For TDX
> > > there shouldn't be an issue. If so, maybe this last one is not so horrible.
>
> It doesn't even have to be ABI that it gives an error. As you say,
> this ioctl can just be advisory only for !confidential machines. Even
> if it were implemented, the shadow MMU can drop roots at any moment

Sure, but there's a difference between KVM _potentially_ dropping roots and
guaranteed failure because userspace is trying to do something that's unsupported.
But I think this is a non-issue, because it should really just be as simple as:

if (!mmu->pre_map_memory)
return -EOPNOTSUPP;

Hmm, or probably this to avoid adding an MMU hook for a single MMU flavor:

if (!tdp_mmu_enabled || !mmu->root_role.direct)
return -EOPNOTSUPP;

> and/or kill the mapping via the shrinker.

Ugh, we really need to kill that code.

> That said, I can't fully shake the feeling that this ioctl should be
> an error for !TDX and that TDX_INIT_MEM_REGION wasn't that bad. The
> implementation was ugly but the API was fine.

Hmm, but IMO the implementation was ugly in no small part because of the contraints
put on KVM by the API. Mapping S-EPT *and* doing TDH.MEM.PAGE.ADD in the same
ioctl() forced KVM to operate on vcpu0, and necessitated shoving temporary data
into a per-VM structure in order to get the source contents into TDH.MEM.PAGE.ADD.

We could eliminate the vcpu0 grossness, but it would require a massive refactor,
which is also not a problem per se, but it's obviously not free. Eliminating
kvm_tdx.source_page is also doable, but it's not clear to me that end result would
be a net positive.

If userspace pre-maps the S-EPT entries ahead of time, then KVM should have a
straight shot to PAGE.ADD, i.e. doesn't need to "pass" the source page via a
scratch field in kvm_tdx, and I think/hope would avoid the need to grab vcpu0
in order to get at an MMU to build the S-EPT.

And stating the obvious, TDX_INIT_MEM_REGION also doesn't allow pre-mapping memory,
which is generally useful, and can be especially beneficial for confidential VMs
(and TDX in particular) due to the added cost of a page fault VM-Exit.

I'm not dead set on this generic ioctl(), but unless it ends up being a train wreck
for userspace, I think it will allow for cleaner and more reusable code in KVM.