Re: [PATCH 2/3] KVM: x86: conditionally clear KVM_REQ_MASTERCLOCK_UPDATE at the end of KVM_SET_CLOCK

From: Dongli Zhang

Date: Mon May 11 2026 - 20:22:01 EST




On 5/9/26 1:04 PM, David Woodhouse wrote:
> On Thu, 2026-01-15 at 12:22 -0800, Dongli Zhang wrote:
>> The KVM_SET_CLOCK command calls pvclock_update_vm_gtod_copy() to update the
>> masterclock data.
>>
>> Many vCPUs may already have KVM_REQ_MASTERCLOCK_UPDATE pending before
>> KVM_SET_CLOCK is invoked. If pvclock_update_vm_gtod_copy() decides to use
>> the masterclock, there is no need to update the masterclock multiple times
>> afterward. As noted in commit c52ffadc65e2 ("KVM: x86: Don't unnecessarily
>> force masterclock update on vCPU hotplug"), each unnecessary
>> KVM_REQ_MASTERCLOCK_UPDATE can cause the kvm-clock time to jump.
>>
>> Therefore, clear KVM_REQ_MASTERCLOCK_UPDATE for each vCPU at the end of
>> KVM_SET_CLOCK when the master clock is active. The 'tsc_write_lock' ensures
>> that only requests issued before KVM_SET_CLOCK are cleared.
>
> Hm, I think we should do this in kvm_make_mclock_inprogress_request()
> instead.
>
> Currently, things like pvclock_gtod_update_fn() and one or two others
> set KVM_REQ_MASTERCLOCK_UPDATE for all vCPUs. Not just one, because we
> don't want *any* vCPU going back into guest mode before doing the work.
>
> So they could *all* end up calling into kvm_masterclock_update() at the
> same time. I have a vague recollection there was once a mutex there,
> but now there isn't.
>
> So all vCPUs can each, in parallel, set KVM_REQ_MCLOCK_INPROGRESS on
> all other vCPUs. That's an unhandled request which just makes a vCPU
> spin (preventing it from entering the guest until the clock update
> completes).
>
> Then each vCPU tries to take the kvm->arch.tsc_write_lock in
> kvm_start_pvclock_update(), and from this point they'll be
> serialized... but every vCPU will go through the whole of
> pvclock_update_vm_gtod_copy() for itself, updating the masterclock one
> more time for each vCPU in the system?
>
> I came here to say that kvm_make_mclock_inprogress_request() should
> probably clear KVM_REQ_MASTERCLOCK_UPDATE on all other vCPUs, since
> it's about to do the work, and the other vCPUs will no longer need to.
>
> But... it's more broken than that now.
>
> I think maybe vcpu_enter_guest() should call kvm_update_masterclock()
> if *kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE)* (i.e. not *clear* the
> bit by using kvm_check_request()).
>
> And then after getting ->arch.tsc_write_lock,
> kvm_start_pvclock_update() should check if KVM_REQ_MASTERCLOCK_UPDATE
> is *still* set, and only proceed with the update if it is?
>
> I'll play with it some more... something like this perhaps?
>
> From 91e37d74ee267c722bc2c8f26754fcdfbc040e29 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> Date: Sat, 9 May 2026 16:54:01 +0100
> Subject: [PATCH] KVM: x86: Avoid redundant masterclock updates from multiple
> vCPUs
>
> When a masterclock update is triggered (e.g. by the clocksource change
> notifier), KVM_REQ_MASTERCLOCK_UPDATE is set on all vCPUs. Without this
> fix, each vCPU independently processes the request and redundantly
> re-executes the entire pvclock_update_vm_gtod_copy() sequence, serialized
> only by tsc_write_lock. Each redundant re-snapshot of the master clock
> reference point introduces potential clock drift.
>
> Fix this by having __kvm_start_pvclock_update() check, after acquiring
> the lock, whether the requesting vCPU's KVM_REQ_MASTERCLOCK_UPDATE is
> still set. If another vCPU already did the update and cleared it, bail
> out. Otherwise, clear the request on all other vCPUs before proceeding.
>
> The caller in vcpu_enter_guest() now uses kvm_test_request() (non-clearing)
> since the clearing is done inside __kvm_start_pvclock_update() under the
> lock.
>
> Suggested-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>


Thank you very much! Feel free to add me as Suggested-by or Reported-by.

I will also help test your large patch series.

Regarding the issue, I would like to confirm whether there can still be any
KVM_REQ_MASTERCLOCK_UPDATE after KVM_SET_CLOCK_GUEST.

Per my understanding, I would expect KVM_SET_CLOCK_GUEST to be the last point
where the masterclock can be updated (during live migration or live update).

After KVM_SET_CLOCK_GUEST, any further masterclock change may cause kvm-clock drift.

Dongli Zhang