Re: [PATCH 2/2] KVM: x86: Don't allow tsc_offset, tsc_scaling_ratio to change
From: Edgecombe, Rick P
Date: Mon Oct 14 2024 - 11:48:57 EST
On Sat, 2024-10-12 at 00:55 -0700, Isaku Yamahata wrote:
> Problem
> The current x86 KVM implementation conflicts with protected TSC because the
> VMM can't change the TSC offset/multiplier. Disable or ignore the KVM
> logic to change/adjust the TSC offset/multiplier somehow.
>
> Because KVM emulates the TSC timer or the TSC deadline timer with the TSC
> offset/multiplier, the TSC timer interrupts is injected to the guest at the
> wrong time if the KVM TSC offset is different from what the TDX module
> determined.
>
> Originally this issue was found by cyclic test of rt-test [1] as the
> latency in TDX case is worse than VMX value + TDX SEAMCALL overhead. It
> turned out that the KVM TSC offset is different from what the TDX module
> determines.
>
> Solution
> The solution is to keep the KVM TSC offset/multiplier the same as the value
> of the TDX module somehow. Possible solutions are as follows.
> - Skip the logic
> Ignore (or don't call related functions) the request to change the TSC
> offset/multiplier.
> Pros
> - Logically clean. This is similar to the guest_protected case.
> Cons
> - Needs to identify the call sites.
>
> - Revert the change at the hooks after TSC adjustment
> x86 KVM defines the vendor hooks when TSC offset/multiplier are
> changed. The callback can revert the change.
> Pros
> - We don't need to care about the logic to change the TSC
> offset/multiplier.
> Cons:
> - Hacky to revert the KVM x86 common code logic.
>
> Choose the first one. With this patch series, SEV-SNP secure TSC can be
> supported.
>
> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
>
> Reported-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
IIUC this problem was reported by Marcelo and he tested these patches and found
that they did *not* resolve his issue? But offline you mentioned that you
reproduced a similar seeming bug on your end that *was* resolved by these
patches. If I got that right, I would think we should figure out Marcelo's
problem before fixing this upstream. If it only affects out-of-tree TDX code we
can take more time and not thrash the code as it gets untangled further.