Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT

From: Sean Christopherson
Date: Mon Sep 09 2024 - 17:12:06 EST


On Mon, Sep 09, 2024, Rick P Edgecombe wrote:
> On Mon, 2024-09-09 at 17:25 +0200, Paolo Bonzini wrote:
> > On 9/4/24 05:07, Rick Edgecombe wrote:
> > > +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in)
> > > +{
> > > +#define SEAMCALL_RETRY_MAX     16
> >
> > How is the 16 determined?  Also, is the lock per-VM or global?
>
> The lock being considered here is per-TD, but TDX_OPERAND_BUSY in general can be
> for other locks. I'm not sure where the 16 came from, maybe Yuan or Isaku can
> share the history. In any case, there seems to be some problems with this patch
> or justification.
>
> Regarding the zero-step mitigation, the TDX Module has a mitigation for an
> attack where a malicious VMM causes repeated private EPT violations for the same
> GPA. When this happens TDH.VP.ENTER will fail to enter the guest. Regardless of
> zero-step detection, these SEPT related SEAMCALLs will exit with the checked
> error code if they contend the mentioned lock. If there was some other (non-
> zero-step related) contention for this lock and KVM tries to re-enter the TD too
> many times without resolving an EPT violation, it might inadvertently trigger
> the zero-step mitigation. I *think* this patch is trying to say not to worry
> about this case, and do a simple retry loop instead to handle the contention.
>
> But why 16 retries would be sufficient, I can't find a reason for. Getting this
> required retry logic right is important because some failures
> (TDH.MEM.RANGE.BLOCK) can lead to KVM_BUG_ON()s.

I (somewhat indirectly) raised this as an issue in v11, and at a (very quick)
glance, nothing has changed to alleviate my concerns.

In general, I am _very_ opposed to blindly retrying an SEPT SEAMCALL, ever. For
its operations, I'm pretty sure the only sane approach is for KVM to ensure there
will be no contention. And if the TDX module's single-step protection spuriously
kicks in, KVM exits to userspace. If the TDX module can't/doesn't/won't communicate
that it's mitigating single-step, e.g. so that KVM can forward the information
to userspace, then that's a TDX module problem to solve.

> Per the docs, in general the VMM is supposed to retry SEAMCALLs that return
> TDX_OPERAND_BUSY.

IMO, that's terrible advice. SGX has similar behavior, where the xucode "module"
signals #GP if there's a conflict. #GP is obviously far, far worse as it lacks
the precision that would help software understand exactly what went wrong, but I
think one of the better decisions we made with the SGX driver was to have a
"zero tolerance" policy where the driver would _never_ retry due to a potential
resource conflict, i.e. that any conflict in the module would be treated as a
kernel bug.

> I think we need to revisit the general question of which
> SEAMCALLs we should be retrying and how many times/how long. The other
> consideration is that KVM already has per-VM locking, that would prevent
> contention for some of the locks. So depending on internal details KVM may not
> need to do any retries in some cases.

Yes, and if KVM can't avoid conflict/retry, then before we go any further, I want
to know exactly why that is the case.