Re: [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available

From: Sean Christopherson
Date: Fri Sep 20 2024 - 03:21:57 EST


On Fri, Sep 20, 2024, Nikunj A. Dadhania wrote:
> On 9/18/2024 5:37 PM, Sean Christopherson wrote:
> > On Mon, Sep 16, 2024, Nikunj A. Dadhania wrote:
> >> On 9/13/2024 11:00 PM, Sean Christopherson wrote:
> >>>> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx>
> >>>> Tested-by: Peter Gonda <pgonda@xxxxxxxxxx>
> >>>> ---
> >>>> arch/x86/kernel/kvmclock.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> >>>> index 5b2c15214a6b..3d03b4c937b9 100644
> >>>> --- a/arch/x86/kernel/kvmclock.c
> >>>> +++ b/arch/x86/kernel/kvmclock.c
> >>>> @@ -289,7 +289,7 @@ void __init kvmclock_init(void)
> >>>> {
> >>>> u8 flags;
> >>>>
> >>>> - if (!kvm_para_available() || !kvmclock)
> >>>> + if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SECURE_TSC))
> >>>
> >>> I would much prefer we solve the kvmclock vs. TSC fight in a generic way. Unless
> >>> I've missed something, the fact that the TSC is more trusted in the SNP/TDX world
> >>> is simply what's forcing the issue, but it's not actually the reason why Linux
> >>> should prefer the TSC over kvmclock. The underlying reason is that platforms that
> >>> support SNP/TDX are guaranteed to have a stable, always running TSC, i.e. that the
> >>> TSC is a superior timesource purely from a functionality perspective. That it's
> >>> more secure is icing on the cake.
> >>
> >> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock
> >> should be disabled assuming that timesource is stable and always running?
> >
> > No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC
> > is stable, irrespective of SNP or TDX. This is effectively already done for the
> > timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> > invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the
> > kvm_sched_clock_init() code.
>
> The kvm-clock and tsc-early both are having the rating of 299. As they are of
> same rating, kvm-clock is being picked up first.
>
> Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will
> be picked up instead.

IMO, it's ugly, but that's a problem with the rating system inasmuch as anything.

But the kernel will still be using kvmclock for the scheduler clock, which is
undesirable.