Re: [PATCH v19 087/130] KVM: TDX: handle vcpu migration over logical processor

From: Huang, Kai
Date: Wed Apr 17 2024 - 21:10:10 EST




On 17/04/2024 4:44 am, Isaku Yamahata wrote:
On Tue, Apr 16, 2024 at 12:05:31PM +1200,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:



On 16/04/2024 10:48 am, Yamahata, Isaku wrote:
On Mon, Apr 15, 2024 at 06:49:35AM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

On Fri, Apr 12, 2024, Isaku Yamahata wrote:
On Fri, Apr 12, 2024 at 03:46:05PM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

On Fri, Apr 12, 2024, Isaku Yamahata wrote:
On Fri, Apr 12, 2024 at 09:15:29AM -0700, Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
+void tdx_mmu_release_hkid(struct kvm *kvm)
+{
+ while (__tdx_mmu_release_hkid(kvm) == -EBUSY)
+ ;
}

As I understand, __tdx_mmu_release_hkid() returns -EBUSY
after TDH.VP.FLUSH has been sent for every vCPU followed by
TDH.MNG.VPFLUSHDONE, which returns TDX_FLUSHVP_NOT_DONE.

Considering earlier comment that a retry of TDH.VP.FLUSH is not
needed, why is this while() loop here that sends the
TDH.VP.FLUSH again to all vCPUs instead of just a loop within
__tdx_mmu_release_hkid() to _just_ resend TDH.MNG.VPFLUSHDONE?

Could it be possible for a vCPU to appear during this time, thus
be missed in one TDH.VP.FLUSH cycle, to require a new cycle of
TDH.VP.FLUSH?

Yes. There is a race between closing KVM vCPU fd and MMU notifier release hook.
When KVM vCPU fd is closed, vCPU context can be loaded again.

But why is _loading_ a vCPU context problematic?

It's nothing problematic. It becomes a bit harder to understand why
tdx_mmu_release_hkid() issues IPI on each loop. I think it's reasonable
to make the normal path easy and to complicate/penalize the destruction path.
Probably I should've added comment on the function.

By "problematic", I meant, why can that result in a "missed in one TDH.VP.FLUSH
cycle"? AFAICT, loading a vCPU shouldn't cause that vCPU to be associated from
the TDX module's perspective, and thus shouldn't trigger TDX_FLUSHVP_NOT_DONE.

I.e. looping should be unnecessary, no?

The loop is unnecessary with the current code.

The possible future optimization is to reduce destruction time of Secure-EPT
somehow. One possible option is to release HKID while vCPUs are still alive and
destruct Secure-EPT with multiple vCPU context. Because that's future
optimization, we can ignore it at this phase.

I kinda lost here.

I thought in the current v19 code, you have already implemented this
optimization?

Or is this optimization totally different from what we discussed in an
earlier patch?

https://lore.kernel.org/lkml/8feaba8f8ef249950b629f3a8300ddfb4fbcf11c.camel@xxxxxxxxx/

That's only the first step. We can optimize it further with multiple vCPUs
context.

OK. Let's put aside how important these optimizations are and whether they should be done in the initial TDX support, I think the right way to organize the patches is to bring functionality first and then put performance optimization later.

That can make both writing code and code review easier.

And more importantly, the "performance optimization" can be discussed _separately_.

For example, as mentioned in the link above, I think the optimization of "releasing HKID in the MMU notifier release to improve TD teardown latency" complicates things a lot, e.g., not only to the TD creation/teardown sequence, but also here -- w/o it we don't even need to consider the race between vCPU load and MMU notifier release:

https://lore.kernel.org/kvm/20240412214201.GO3039520@xxxxxxxxxxxxxxxxxxxxx/

So I think we should start with implementing these sequences in "normal way" first, and then do the optimization(s) later.

And to me the "normal way" for TD creation/destruction we can just do:

1) Use normal SEPT sequence to teardown private EPT page table;
2) Do VP.FLUSH when vCPU is destroyed
3) Do VPFLUSHDONE after all vCPUs are destroyed
4) release HKID at last stage of destroying VM.

For vCPU migration, you do VP.FLUSH on the old pCPU before you load the vCPU to the new pCPU, as shown in this patch.

Then you don't need to cover the silly code change around tdx_mmu_release_hkid() in this patch.

Am I missing anything?