Re: [PATCH 2/2] KVM: x86: Don't allow tsc_offset, tsc_scaling_ratio to change

From: Edgecombe, Rick P
Date: Mon Oct 14 2024 - 11:48:57 EST


On Sat, 2024-10-12 at 00:55 -0700, Isaku Yamahata wrote:
> Problem
> The current x86 KVM implementation conflicts with protected TSC because the
> VMM can't change the TSC offset/multiplier.  Disable or ignore the KVM
> logic to change/adjust the TSC offset/multiplier somehow.
>
> Because KVM emulates the TSC timer or the TSC deadline timer with the TSC
> offset/multiplier, the TSC timer interrupts is injected to the guest at the
> wrong time if the KVM TSC offset is different from what the TDX module
> determined.
>
> Originally this issue was found by cyclic test of rt-test [1] as the
> latency in TDX case is worse than VMX value + TDX SEAMCALL overhead.  It
> turned out that the KVM TSC offset is different from what the TDX module
> determines.
>
> Solution
> The solution is to keep the KVM TSC offset/multiplier the same as the value
> of the TDX module somehow.  Possible solutions are as follows.
> - Skip the logic
>   Ignore (or don't call related functions) the request to change the TSC
>   offset/multiplier.
>   Pros
>   - Logically clean.  This is similar to the guest_protected case.
>   Cons
>   - Needs to identify the call sites.
>
> - Revert the change at the hooks after TSC adjustment
>   x86 KVM defines the vendor hooks when TSC offset/multiplier are
>   changed.  The callback can revert the change.
>   Pros
>   - We don't need to care about the logic to change the TSC
>     offset/multiplier.
>   Cons:
>   - Hacky to revert the KVM x86 common code logic.
>
> Choose the first one.  With this patch series, SEV-SNP secure TSC can be
> supported.
>
> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
>
> Reported-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

IIUC this problem was reported by Marcelo and he tested these patches and found
that they did *not* resolve his issue? But offline you mentioned that you
reproduced a similar seeming bug on your end that *was* resolved by these
patches. If I got that right, I would think we should figure out Marcelo's
problem before fixing this upstream. If it only affects out-of-tree TDX code we
can take more time and not thrash the code as it gets untangled further.