Re: [PATCH v19 038/130] KVM: TDX: create/destroy VM structure

From: Isaku Yamahata
Date: Tue Apr 02 2024 - 02:16:50 EST


On Fri, Mar 29, 2024 at 03:25:47PM +0800,
Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> wrote:

>
>
> On 3/29/2024 4:39 AM, Isaku Yamahata wrote:
>
> [...]
> > > > > > How about this?
> > > > > >
> > > > > > /*
> > > > > > * We need three SEAMCALLs, TDH.MNG.VPFLUSHDONE(), TDH.PHYMEM.CACHE.WB(), and
> > > > > > * TDH.MNG.KEY.FREEID() to free the HKID.
> > > > > > * Other threads can remove pages from TD. When the HKID is assigned, we need
> > > > > > * to use TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE().
> > > > > > * TDH.PHYMEM.PAGE.RECLAIM() is needed when the HKID is free. Get lock to not
> > > > > > * present transient state of HKID.
> > > > > > */
> > > > > Could you elaborate why it is still possible to have other thread removing
> > > > > pages from TD?
> > > > >
> > > > > I am probably missing something, but the thing I don't understand is why
> > > > > this function is triggered by MMU release? All the things done in this
> > > > > function don't seem to be related to MMU at all.
> > > > The KVM releases EPT pages on MMU notifier release. kvm_mmu_zap_all() does. If
> > > > we follow that way, kvm_mmu_zap_all() zaps all the Secure-EPTs by
> > > > TDH.MEM.SEPT.REMOVE() or TDH.MEM.PAGE.REMOVE(). Because
> > > > TDH.MEM.{SEPT, PAGE}.REMOVE() is slow, we can free HKID before kvm_mmu_zap_all()
> > > > to use TDH.PHYMEM.PAGE.RECLAIM().
> > > Can you elaborate why TDH.MEM.{SEPT,PAGE}.REMOVE is slower than
> > > TDH.PHYMEM.PAGE.RECLAIM()?
> > >
> > > And does the difference matter in practice, i.e. did you see using the former
> > > having noticeable performance downgrade?
> > Yes. With HKID alive, we have to assume that vcpu can run still. It means TLB
> > shootdown. The difference is 2 extra SEAMCALL + IPI synchronization for each
> > guest private page. If the guest has hundreds of GB, the difference can be
> > tens of minutes.
> >
> > With HKID alive, we need to assume vcpu is alive.
> > - TDH.MEM.PAGE.REMOVE()
> > - TDH.PHYMEM.PAGE_WBINVD()
> > - TLB shoot down
> > - TDH.MEM.TRACK()
> > - IPI to other vcpus
> > - wait for other vcpu to exit
>
> Do we have a way to batch the TLB shoot down.
> IIUC, in current implementation, TLB shoot down needs to be done for each
> page remove, right?

That's right because the TDP MMU allows multiple vcpus to operate on EPT
concurrently. Batching makes the logic more complex. It's straightforward to
use the mmu notifier to know that we start to destroy the guest.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>