Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion

From: Yan Zhao
Date: Wed May 22 2024 - 02:49:43 EST


On Tue, May 21, 2024 at 07:31:31PM -0700, Sean Christopherson wrote:
> On Wed, May 22, 2024, Yan Zhao wrote:
> > On Fri, May 17, 2024 at 05:30:50PM +0200, Paolo Bonzini wrote:
> > > On 5/16/24 01:20, Sean Christopherson wrote:
> > > > Hmm, a quirk isn't a bad idea. It suffers the same problems as a memslot flag,
> > > > i.e. who knows when it's safe to disable the quirk, but I would hope userspace
> > > > would be much, much cautious about disabling a quirk that comes with a massive
> > > > disclaimer.
> > > >
> > > > Though I suspect Paolo will shoot this down too 😉
> > >
> > > Not really, it's probably the least bad option. Not as safe as keying it
> > > off the new machine types, but less ugly.
> > A concern about the quirk is that before identifying the root cause of the
> > issue, we don't know which one is a quirk, fast zapping all TDPs or slow zapping
> > within memslot range.
>
> The quirk is specifically that KVM zaps SPTEs that aren't related to the memslot
> being deleted/moved. E.g. the issue went away if KVM zapped a rather arbitrary
> set of SPTEs. IIRC, there was a specific gfn range that was "problematic", but
> we never figured out the correlation between the problematic range and the memslot
> being deleted.
>
So, a quirk like KVM_X86_QUIRK_ZAP_ALL_ON_MEMSLOT_DELETION, and enable it by
default?

> Disabling the quirk would allow KVM to choose between a slow/precise/partial zap,
> and full/fast zap.
TDX needs to disable the quirk for slow/precise/partial zap, right?
Then, when unsafe and passthrough devices are involved in TDX, we need to either
keep disabling the quirk if no bug reported or identify the root cause then.
Is that correct?