Re: [PATCH v2] KVM: x86: Use gfn_to_pfn_cache for record_steal_time

From: Sean Christopherson

Date: Thu Mar 12 2026 - 20:17:59 EST

With some assistance from an AI review bot (well, more than "some").

On Wed, Mar 11, 2026, David Woodhouse wrote:
> @@ -3806,39 +3788,32 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> st_preempted & KVM_VCPU_FLUSH_TLB);
> if (st_preempted & KVM_VCPU_FLUSH_TLB)
> kvm_vcpu_flush_tlb_guest(vcpu);
> -
> - if (!user_access_begin(st, sizeof(*st)))
> - goto dirty;
> } else {
> - if (!user_access_begin(st, sizeof(*st)))
> - return;
> -
> - unsafe_put_user(0, &st->preempted, out);
> + st->preempted = 0;

These should all be WRITE_ONCE(), correct?

> vcpu->arch.st.preempted = 0;
> }
>
> - unsafe_get_user(version, &st->version, out);
> + version = st->version;

And then READ_ONCE()?

> if (version & 1)
> version += 1; /* first time write, random junk */
>
> version += 1;
> - unsafe_put_user(version, &st->version, out);
> + st->version = version;
>
> smp_wmb();
>
> - unsafe_get_user(steal, &st->steal, out);
> + steal = st->steal;
> steal += current->sched_info.run_delay -
> vcpu->arch.st.last_steal;
> vcpu->arch.st.last_steal = current->sched_info.run_delay;
> - unsafe_put_user(steal, &st->steal, out);
> + st->steal = steal;
>
> version += 1;
> - unsafe_put_user(version, &st->version, out);
> + st->version = version;

And then here, doesn't there need to be an smp_wmb() before incrementing the
version again? Because I believe making this all vanilla C means the compiler
can reorder those two. Per the friendly bot:

The previous code used unsafe_put_user(), which inherently acts as a
compiler barrier due to its internal inline assembly

> +
> + kvm_gpc_mark_dirty_in_slot(gpc);
>
> - out:
> - user_access_end();
> - dirty:
> - mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
> + read_unlock(&gpc->lock);
> }
>
> /*
> @@ -4173,8 +4148,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>
> vcpu->arch.st.msr_val = data;
>
> - if (!(data & KVM_MSR_ENABLED))
> - break;
> + if (data & KVM_MSR_ENABLED) {

Curly braces aren't required.

> + kvm_gpc_activate(&vcpu->arch.st.cache, data & ~KVM_MSR_ENABLED,
> + sizeof(struct kvm_steal_time));
> + } else {
> + kvm_gpc_deactivate(&vcpu->arch.st.cache);
> + }

...

> @@ -5266,20 +5244,28 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
> if (unlikely(current->mm != vcpu->kvm->mm))
> return;
>
> - slots = kvm_memslots(vcpu->kvm);
> -
> - if (unlikely(slots->generation != ghc->generation ||
> - gpa != ghc->gpa ||
> - kvm_is_error_hva(ghc->hva) || !ghc->memslot))
> - return;
> + read_lock_irqsave(&gpc->lock, flags);

I'm pretty sure this is going to make PROVE_LOCKING unhappy due to PREEMPT_RT
making rwlock_t sleepable (when called from kvm_sched_out()). I've been content
to ignore the kvm_xen_set_evtchn_fast() warning[*] because I can't imagine anyone
is crazy enough to emulate Xen with an RT kernel, but I do know there are RT users
that run VMs, and so this path would be more than just a PROVE_LOCKING issue.

If we want to push the gpc stuff broadly, we need a solution to that (though I'm
still not 100% convinced using a gpc here is a net positive).

[*] https://lore.kernel.org/all/673f4bbc.050a0220.3c9d61.0174.GAE@xxxxxxxxxx