Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT

From: Edgecombe, Rick P
Date: Mon Sep 09 2024 - 16:22:38 EST


On Mon, 2024-09-09 at 17:25 +0200, Paolo Bonzini wrote:
> On 9/4/24 05:07, Rick Edgecombe wrote:
> > +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in)
> > +{
> > +#define SEAMCALL_RETRY_MAX     16
>
> How is the 16 determined?  Also, is the lock per-VM or global?

The lock being considered here is per-TD, but TDX_OPERAND_BUSY in general can be
for other locks. I'm not sure where the 16 came from, maybe Yuan or Isaku can
share the history. In any case, there seems to be some problems with this patch
or justification.

Regarding the zero-step mitigation, the TDX Module has a mitigation for an
attack where a malicious VMM causes repeated private EPT violations for the same
GPA. When this happens TDH.VP.ENTER will fail to enter the guest. Regardless of
zero-step detection, these SEPT related SEAMCALLs will exit with the checked
error code if they contend the mentioned lock. If there was some other (non-
zero-step related) contention for this lock and KVM tries to re-enter the TD too
many times without resolving an EPT violation, it might inadvertently trigger
the zero-step mitigation. I *think* this patch is trying to say not to worry
about this case, and do a simple retry loop instead to handle the contention.

But why 16 retries would be sufficient, I can't find a reason for. Getting this
required retry logic right is important because some failures
(TDH.MEM.RANGE.BLOCK) can lead to KVM_BUG_ON()s.

Per the docs, in general the VMM is supposed to retry SEAMCALLs that return
TDX_OPERAND_BUSY. I think we need to revisit the general question of which
SEAMCALLs we should be retrying and how many times/how long. The other
consideration is that KVM already has per-VM locking, that would prevent
contention for some of the locks. So depending on internal details KVM may not
need to do any retries in some cases.