Re: [PATCH v2 20/20] KVM: x86: Use gfn_to_pfn_cache for record_steal_time

From: David Woodhouse

Date: Tue Jun 02 2026 - 08:29:44 EST


On Sat, 30 May 2026 06:19:32 +0000, sashiko-bot@xxxxxxxxxx wrote:
> [Severity: High]
> Does this introduce a scheduling while atomic bug on non-PREEMPT_RT kernels?
>
> The CLASS(gpc_map_local, st_map) macro acquires a read_lock on gpc->lock,
> which disables preemption. While this lock is held, if the guest supports
> PV TLB flush, the code calls kvm_vcpu_flush_tlb_guest().
>
> If TDP is disabled (shadow paging), kvm_vcpu_flush_tlb_guest() calls
> kvm_mmu_sync_roots() and eventually mmu_sync_children(). This path can yield
> via cond_resched_rwlock_write(). Yielding while preemption is disabled by
> the gpc read lock will trigger a BUG.

Ah, that issue exists in the previous versions too, but it's simple
enough to fix. There's no particular timing constraint for flushing the
TLB; it just have to be done before this vCPU ever runs again. It can
just be moved to the end of the function after the lock is dropped.

That does mean record_steal_time() should use the explicit
gpc_map_local_lock()/gpc_map_local_unlock() instead of the CLASS()
macro, but that's easy enough.

> [Severity: High]
> Can this cause an ABBA deadlock in the scheduler on PREEMPT_RT?
>
> kvm_steal_time_set_preempted() is called during a vCPU context switch
> via kvm_sched_out(), which is a preempt notifier hook. This runs with
> the local CPUs runqueue lock (rq->lock) held.
>
> While the code correctly uses a trylock to avoid sleeping when acquiring
> the lock, the CLASS macro destructor will implicitly call read_unlock()
> at the end of the function. On PREEMPT_RT kernels, unlocking an rt_rwlock
> can wake up a waiting writer, such as a concurrent kvm_gpc_refresh().

This one is more fun, and the problem already exists on RT for the Xen
runstate update so it's not even really new with this patch.

I think the gpc lock needs to be a raw_rwlock_t, so that it remains a
true spinning lock even on PREEMPT_RT. The writer side
(kvm_gpc_refresh) only holds the lock briefly to swap in the new
pfn/khva after doing the heavyweight work, so it should be fine as a
raw lock.

Attachment: smime.p7s
Description: S/MIME cryptographic signature