Re: [v2 PATCH 4/4] x86/kvm: Add guest side support for virtual suspend time injection

From: Thomas Gleixner
Date: Tue Aug 10 2021 - 11:48:13 EST


On Fri, Aug 06 2021 at 19:07, Hikaru Nishida wrote:
> arch/x86/Kconfig | 13 ++++++++++
> arch/x86/include/asm/idtentry.h | 4 +++
> arch/x86/include/asm/kvm_para.h | 9 +++++++
> arch/x86/kernel/kvmclock.c | 40 +++++++++++++++++++++++++++++
> include/linux/timekeeper_internal.h | 4 +++
> kernel/time/timekeeping.c | 33 ++++++++++++++++++++++++

Again, this wants to be split into infrastructure and usage.

> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -124,6 +124,10 @@ struct timekeeper {
> u32 ntp_err_mult;
> /* Flag used to avoid updating NTP twice with same second */
> u32 skip_second_overflow;
> +#ifdef CONFIG_KVM_VIRT_SUSPEND_TIMING_GUEST
> + /* suspend_time_injected keeps the duration injected through kvm */
> + u64 suspend_time_injected;

This is KVM only, so please can we have a name for that struct member
which reflects this?

> +#endif
> #ifdef CONFIG_DEBUG_TIMEKEEPING
> long last_warning;
> /*

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 3ac3fb479981..424c61d38646 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -2125,6 +2125,39 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
> return offset;
> }
>
> +#ifdef CONFIG_KVM_VIRT_SUSPEND_TIMING_GUEST
> +/*
> + * timekeeping_inject_virtual_suspend_time - Inject virtual suspend time
> + * when requested by the kvm host.

If this is an attempt to provide a kernel-doc comment for this function,
then it's clearly a failed attempt and aside of that malformatted.

> + * This function should be called under irq context.

Why? There is no reason for being called from interrupt context and
nothing inforces it.

> + */
> +void timekeeping_inject_virtual_suspend_time(void)
> +{
> + /*
> + * Only updates shadow_timekeeper so the change will be reflected
> + * on the next call of timekeeping_advance().

No. That's broken.

timekeeping_inject_virtual_suspend_time();

do_settimeofday() or do_adjtimex()

timekeeping_update(tk, TK_MIRROR...);

and your change to the shadow timekeeper is gone.

Of course there is also no justification for this approach. What's wrong
with updating it right away?

> + */
> + struct timekeeper *tk = &shadow_timekeeper;
> + unsigned long flags;
> + struct timespec64 delta;
> + u64 suspend_time;

Please sort variables in reverse fir tree order and not randomly as you
see fit.

> +
> + raw_spin_lock_irqsave(&timekeeper_lock, flags);
> + suspend_time = kvm_get_suspend_time();
> + if (suspend_time > tk->suspend_time_injected) {
> + /*
> + * Do injection only if the time is not injected yet.
> + * suspend_time and tk->suspend_time_injected values are
> + * cummrative, so take a diff and inject the duration.

cummrative?

> + */
> + delta = ns_to_timespec64(suspend_time - tk->suspend_time_injected);
> + __timekeeping_inject_sleeptime(tk, &delta);
> + tk->suspend_time_injected = suspend_time;

It's absolutely unclear how this storage and diff magic works and the
comment is not helping someone not familiar with the implementation of
kvm_get_suspend_time() and the related code at all. Please explain
non-obvious logic properly.

Thanks,

tglx