Re: [PATCH v12 16/19] x86/kvmclock: Use clock source callback to update kvm sched clock

From: Sean Christopherson
Date: Wed Oct 09 2024 - 12:18:28 EST


On Wed, Oct 09, 2024, Nikunj A Dadhania wrote:
> Although the kernel switches over to stable TSC clocksource instead of
> kvmclock, the scheduler still keeps on using kvmclock as the sched clock.
> This is due to kvm_sched_clock_init() updating the pv_sched_clock()
> unconditionally.

All PV clocks are affected by this, no? This seems like something that should
be handled in common code, which is the point I was trying to make in v11.

> Use the clock source enable/disable callbacks to initialize
> kvm_sched_clock_init() and update the pv_sched_clock().
>
> As the clock selection happens in the stop machine context, schedule
> delayed work to update the static_call()
>
> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx>
> ---
> arch/x86/kernel/kvmclock.c | 34 +++++++++++++++++++++++++++++-----
> 1 file changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 5b2c15214a6b..5cd3717e103b 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -21,6 +21,7 @@
> #include <asm/hypervisor.h>
> #include <asm/x86_init.h>
> #include <asm/kvmclock.h>
> +#include <asm/timer.h>
>
> static int kvmclock __initdata = 1;
> static int kvmclock_vsyscall __initdata = 1;
> @@ -148,12 +149,39 @@ bool kvm_check_and_clear_guest_paused(void)
> return ret;
> }
>
> +static u64 (*old_pv_sched_clock)(void);
> +
> +static void enable_kvm_sc_work(struct work_struct *work)
> +{
> + u8 flags;
> +
> + old_pv_sched_clock = static_call_query(pv_sched_clock);
> + flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
> + kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
> +}
> +
> +static DECLARE_DELAYED_WORK(enable_kvm_sc, enable_kvm_sc_work);
> +
> +static void disable_kvm_sc_work(struct work_struct *work)
> +{
> + if (old_pv_sched_clock)

This feels like it should be a WARN condition, as IIUC, pv_sched_clock() should
never be null. And it _looks_ wrong too, as it means kvm_clock will remain the
sched clock if there was no old clock, which should be impossible.

> + paravirt_set_sched_clock(old_pv_sched_clock);