Re: [PATCH v19 087/130] KVM: TDX: handle vcpu migration over logical processor

From: Sean Christopherson
Date: Fri Apr 12 2024 - 18:46:17 EST


On Fri, Apr 12, 2024, Isaku Yamahata wrote:
> On Fri, Apr 12, 2024 at 09:15:29AM -0700, Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
> > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > +{
> > > + while (__tdx_mmu_release_hkid(kvm) == -EBUSY)
> > > + ;
> > > }
> >
> > As I understand, __tdx_mmu_release_hkid() returns -EBUSY
> > after TDH.VP.FLUSH has been sent for every vCPU followed by
> > TDH.MNG.VPFLUSHDONE, which returns TDX_FLUSHVP_NOT_DONE.
> >
> > Considering earlier comment that a retry of TDH.VP.FLUSH is not
> > needed, why is this while() loop here that sends the
> > TDH.VP.FLUSH again to all vCPUs instead of just a loop within
> > __tdx_mmu_release_hkid() to _just_ resend TDH.MNG.VPFLUSHDONE?
> >
> > Could it be possible for a vCPU to appear during this time, thus
> > be missed in one TDH.VP.FLUSH cycle, to require a new cycle of
> > TDH.VP.FLUSH?
>
> Yes. There is a race between closing KVM vCPU fd and MMU notifier release hook.
> When KVM vCPU fd is closed, vCPU context can be loaded again.

But why is _loading_ a vCPU context problematic? If I'm reading the TDX module
code correctly, TDX_FLUSHVP_NOT_DONE is returned when a vCPU is "associated" with
a pCPU, and association only happens during TDH.VP_ENTER, TDH.MNG.RD, and TDH.MNG.WR,
none of which I see in tdx_vcpu_load().

Assuming there is something problematic lurking under vcpu_load(), I would love,
love, LOVE an excuse to not do vcpu_{load,put}() in kvm_unload_vcpu_mmu(), i.e.
get rid of that thing entirely.

I have definitely looked into kvm_unload_vcpu_mmu() on more than one occassion,
but I can't remember off the top of my head why I have never yanked out the
vcpu_{load,put}(). Maybe I was just scared of breaking something and didn't have
a good reason to risk breakage?

> The MMU notifier release hook eventually calls tdx_mmu_release_hkid(). Other
> kernel thread (concretely, vhost krenel thread) can get reference count to
> mmu and put it by timer, the MMU notifier release hook can be triggered
> during closing vCPU fd.
>
> The possible alternative is to make the vCPU closing path complicated not to
> load vCPU context instead f sending IPI on every retry.