Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT

From: Edgecombe, Rick P
Date: Mon Sep 09 2024 - 20:50:19 EST


On Mon, 2024-09-09 at 16:58 -0700, Sean Christopherson wrote:
> On Mon, Sep 09, 2024, Rick P Edgecombe wrote:
> > On Mon, 2024-09-09 at 14:23 -0700, Sean Christopherson wrote:
> > > > In general, I am _very_ opposed to blindly retrying an SEPT SEAMCALL,
> > > > ever.  For its operations, I'm pretty sure the only sane approach is for
> > > > KVM to ensure there will be no contention.  And if the TDX module's
> > > > single-step protection spuriously kicks in, KVM exits to userspace.  If
> > > > the TDX module can't/doesn't/won't communicate that it's mitigating
> > > > single-step, e.g. so that KVM can forward the information to userspace,
> > > > then that's a TDX module problem to solve.
> > > >
> > > > > Per the docs, in general the VMM is supposed to retry SEAMCALLs that
> > > > > return TDX_OPERAND_BUSY.
> > > >
> > > > IMO, that's terrible advice.  SGX has similar behavior, where the xucode
> > > > "module" signals #GP if there's a conflict.  #GP is obviously far, far
> > > > worse as it lacks the precision that would help software understand
> > > > exactly what went wrong, but I think one of the better decisions we made
> > > > with the SGX driver was to have a "zero tolerance" policy where the
> > > > driver would _never_ retry due to a potential resource conflict, i.e.
> > > > that any conflict in the module would be treated as a kernel bug.
> >
> > Thanks for the analysis. The direction seems reasonable to me for this lock
> > in
> > particular. We need to do some analysis on how much the existing mmu_lock
> > can
> > protects us.
>
> I would operate under the assumption that it provides SEPT no meaningful
> protection.
> I think I would even go so far as to say that it is a _requirement_ that
> mmu_lock
> does NOT provide the ordering required by SEPT, because I do not want to take
> on
> any risk (due to SEPT constraints) that would limit KVM's ability to do things
> while holding mmu_lock for read.

Ok. Not sure, but I think you are saying not to add any extra acquisitions of
mmu_lock.

> >
> > Maybe sprinkle some asserts for documentation purposes.
>
> Not sure I understand, assert on what?

Please ignore. For the asserts, I was imagining mmu_lock acquisitions in core
MMU code might already protect the non-zero-step TDX_OPERAND_BUSY cases, and we
could somehow explain this in code. But it seems less likely.

[snip]
>
> I don't know.  It would depend on what operations can hit BUSY, and what the
> alternatives are.  E.g. if we can narrow down the retry paths to a few select
> cases where it's (a) expected, (b) unavoidable, and (c) has minimal risk of
> deadlock, then maybe that's the least awful option.
>
> What I don't think KVM should do is blindly retry N number of times, because
> then there are effectively no rules whatsoever.

Complete agreement.

>   E.g. if KVM is tearing down a
> VM then KVM should assert on immediate success.  And if KVM is acquit ions a
> fault
> on behalf of a vCPU, then KVM can and should resume the guest and let it
> retry.
> Ugh, but that would likely trigger the annoying "zero-step mitigation" crap.
>
> What does this actually mean in practice?  What's the threshold, is the VM-
> Enter
> error uniquely identifiable, and can KVM rely on HOST_PRIORITY to be set if
> KVM
> runs afoul of the zero-step mitigation?
>
>   After a pre-determined number of such EPT violations occur on the same
> instruction,
>   the TDX module starts tracking the GPAs that caused Secure EPT faults and
> fails
>   further host VMM attempts to enter the TD VCPU unless previously faulting
> private
>   GPAs are properly mapped in the Secure EPT.
>
> If HOST_PRIORITY is set, then one idea would be to resume the guest if there's
> SEPT contention on a fault, and then _if_ the zero-step mitigation is
> triggered,
> kick all vCPUs (via IPI) to ensure that the contended SEPT entry is unlocked
> and
> can't be re-locked by the guest.  That would allow KVM to guarantee forward
> progress without an arbitrary retry loop in the TDP MMU.
>
> Similarly, if KVM needs to zap a SPTE and hits BUSY, kick all vCPUs to ensure
> the
> one and only retry is guaranteed to succeed.

Ok so not against retry loops, just against magic number retry loops with no
explanation that can be found. Makes sense.

Until we answer some of the questions (i.e. HOST_PRIORITY exposure), it's hard
to say. We need to check some stuff on our end.