Re: [PATCH 09/12] KVM: guest_memfd: move check for already-populated page to common code

From: Paolo Bonzini
Date: Mon Jul 15 2024 - 12:11:25 EST


On Sun, Jul 14, 2024 at 7:33 AM Michael Roth <michael.roth@xxxxxxx> wrote:
> > I guess this series is trying to help userspace not mess up the order of things
> > for SEV, where as TDX's design was to let userspace hold the pieces from the
> > beginning. As in, needing to match up the KVM_PRE_FAULT_MEMORY and
> > KVM_TDX_INIT_MEM_REGION calls, mysteriously return errors in later IOCTLs if
> > something was missed, etc.
>
> If SNP were to try to call KVM_PRE_FAULT_MEMORY before SNP_LAUNCH_UPDATE
> (rough equivalent to KVM_TDX_INIT_MEM_REGION), I think the same issue
> would arise, and in that case the uptodate flag you prototyped would
> wouldn't be enough to address it because SNP_LAUNCH_UPDATE would end up
> failing because the gmem_prepare hook previously triggered by
> KVM_PRE_FAULT_MEMORY would have put the corresponding RMP entries into
> an unexpected state (guest-owned/private).

Indeed, and I'd love for that to be the case for both TDX and SNP.

> So for SNP, KVM_PRE_FAULT_MEMORY/SNP_LAUNCH_UPDATE are mutually
> exclusive on what GPA ranges they can prep before finalizing launch state.

Not a problem; is KVM_PRE_FAULT_MEMORY before finalization the same as
zeroing memory?

> I realize that is awkward for TDX, where the KVM_PRE_FAULT_MEMORY is
> required to create the sEPT mapping before encrypting, but maybe it
> would be possible for TDX to just do that implicitly within
> KVM_TDX_INIT_MEM_REGION?

Yes, and it's what the TDX API used to be like a while ago.
Locking-wise, Rick confirmed offlist that there's no problem in
calling kvm_arch_vcpu_pre_fault_memory() from tdx_gmem_post_populate()
(my fault that it went offlist - email from the phone is hard...).

To be clear, I have no problem at all reusing the prefaulting code,
that's better than TDX having to do its own thing. But forcing
userspace to do two passes is not great (it's already not great that
it has to be TDX_INIT_MEM_REGION has to be a VCPU operation, but
that's unfortunately unavoidable ).

Paolo