Re: [PATCH 3/3] KVM: x86: conditionally update masterclock data in pvclock_update_vm_gtod_copy()

From: Dongli Zhang

Date: Mon May 11 2026 - 20:17:34 EST




On 5/9/26 5:22 AM, David Woodhouse wrote:
> On Thu, 2026-01-15 at 12:22 -0800, Dongli Zhang wrote:
>> The pvclock_update_vm_gtod_copy() function always unconditionally updates
>> ka->master_kernel_ns and ka->master_cycle_now whenever a
>> KVM_REQ_MASTERCLOCK_UPDATE occurs. Unfortunately, each masterclock update
>> increases the risk of kvm-clock drift.
>>
>> If pvclock_update_vm_gtod_copy() is not called from
>> vcpu_enter_guest()-->kvm_update_masterclock(), we keep the existing
>> workflow. The argument 'forced' is introduced to tell where it is from.
>>
>> Otherwise, we avoid updating the masterclock if it is already
>> active and will remain active. In such cases, updating the masterclock
>> data is not beneficial and can instead lead to kvm-clock drift.
>>
>> As a result, this patch minimizes the chance of unnecessary masterclock
>> data updates to avoid kvm-clock drift.
>>
>> Cc: David Woodhouse <dwmw@xxxxxxxxxxxx>
>> Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
>
> Hmm... so the only caller of pvclock_update_vm_gtod_copy() that doesn't
> set the 'force' argument is the one in kvm_update_masterclock(), so we
> are asserting that kvm_update_masterclock() never needs to *change* the
> masterclock origin point, if it was already set?
>
> The gtod notifier callback in pvclock_gtod_update_fn() also ends up
> setting KVM_REQ_MASTERCLOCK_UPDATE, and is triggered by an actual host
> timekeeping update (which could potentially be from a clocksource
> change). It also hypothetically possible that the clocksource changes
> from TSC → HPET → TSC, switching back to TSC again before the
> masterclock update ever gets to run. Or maybe a suspend/resume?
>
> Are you *sure* that the optimisation is always valid...?

Thank you very much!

I didn't validate the scenario you mentioned. I missed that scenario because I
assumed that most production systems nowadays use STABLE/CONSTANT/NONSTOP TSC as
the host clocksource, although I sometimes forgot to add "clocksource=tsc
tsc=reliable" to my AMD L1 KVM guest (acting as the hypervisor for L2 guest).

I didn't follow up on this patch because I noticed another issue. I found that
the tsc_timestamp in the PVTI can become a very large number if we simply reboot
the guest VM. This happens because the patch stops updating the masterclock data
when the masterclock is already active and remains active.

For example:

current guest TSC: 122763682
PVTI->tsc_timestamp = 18446744073656540006
PVTI->system_time=196515164269

Although I could not reproduce any bug, that still made me feel uncomfortable.

I think the patch you posted for PATCH 2/3 can fix the issue mentioned in this
patch.

https://lore.kernel.org/kvm/f37f0ae0ae1dfce3ad3c6fce653f5df34adecc0a.camel@xxxxxxxxxxxxx

I would also test with your patchset, if the unnecessary masterclock update is
avoided or minimized to one time.

https://lore.kernel.org/kvm/20260509224824.3264567-27-dwmw2@xxxxxxxxxxxxx/

Thank you very much!

Dongli Zhang