Re: [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>

From: Binbin Wu
Date: Tue Feb 18 2025 - 20:00:48 EST




On 2/19/2025 8:29 AM, Sean Christopherson wrote:
On Mon, Feb 17, 2025, Binbin Wu wrote:
On 2/13/2025 11:17 PM, Sean Christopherson wrote:
On Thu, Feb 13, 2025, Binbin Wu wrote:
On 2/13/2025 11:23 AM, Binbin Wu wrote:
On 2/13/2025 2:56 AM, Sean Christopherson wrote:
On Wed, Feb 12, 2025, Binbin Wu wrote:
On 2/12/2025 8:46 AM, Sean Christopherson wrote:
I am completely comfortable saying that KVM doesn't care about STI/SS shadows
outside of the HALTED case, and so unless I'm missing something, I think it makes
sense for tdx_protected_apic_has_interrupt() to not check RVI outside of the HALTED
case, because it's impossible to know if the interrupt is actually unmasked, and
statistically it's far, far more likely that it _is_ masked.
OK. Will update tdx_protected_apic_has_interrupt() in "TDX interrupts" part.
And use kvm_vcpu_has_events() to replace the open code in this patch.
Something to keep an eye on: kvm_vcpu_has_events() returns true if pv_unhalted
is set, and pv_unhalted is only cleared on transitions KVM_MP_STATE_RUNNABLE.
If the guest initiates a spurious wakeup, pv_unhalted could be left set in
perpetuity.
Oh, yes.
KVM_HC_KICK_CPU is allowed in TDX guests.
And a clever guest can send a REMRD IPI.

The change below looks good to me.

One minor issue is when guest initiates a spurious wakeup, pv_unhalted is
left set, then later when the guest want to halt the vcpu, in
__kvm_emulate_halt(), since pv_unhalted is still set and the state will not
transit to KVM_MP_STATE_HALTED.
But I guess it's guests' responsibility to not initiate spurious wakeup,
guests need to bear the fact that HLT could fail due to a previous
spurious wakeup?
Just found a patch set for fixing the issue.
FWIW, Jim's series doesn't address spurious wakeups per se, it just ensures
pv_unhalted is cleared when transitioning to RUNNING. If the vCPU is already
RUNNING, __apic_accept_irq() will set pv_unhalted and nothing will clear it
until the next transition to RUNNING (which implies at least an attempted
transition away from RUNNING).

Indeed.

I am wondering why KVM doesn't clear pv_unhalted before the vcpu entering guest?
Is the additional memory access a concern or is there some other reason?
Not clearing pv_unhalted when entering the guest is necessary to avoid races.
Stating the obvious, the guest must set up all of its lock tracking before executing
HLT, which means that the soon-to-be-blocking vCPU is eligible for wakeup *before*
it executes HLT. If an asynchronous exit happens on the vCPU at just the right
time, KVM could clear pv_unhalted before the vCPU executes HLT.

Got it.
Thanks for the explanation.