Re: [PATCH v2 01/20] locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()
From: David Woodhouse
Date: Mon Jun 01 2026 - 06:25:44 EST
On Mon, 2026-06-01 at 11:40 +0200, Peter Zijlstra wrote:
> On Sat, May 30, 2026 at 01:47:06PM +0100, David Woodhouse wrote:
> > On Sat, 2026-05-30 at 12:26 +0200, Paolo Bonzini wrote:
> > >
> > > Yeah, I think so.
> > >
> > > The write side needs kvm->srcu so it would have to be yet another SRCU.
> > > I initially thought that sucks for the code that calls kvm_gpc_check(),
> > > but maybe not because it simply replaces read_lock/read_unlock.
> > >
> > > By using a seqcount for the data, SRCU only needs to be synchronized in
> > > gpc_unmap(). So, something like this:
> >
> > It isn't just gpc_unmap() which does the invalidation. We also
> > invalidate from the MMU notifier in gfn_to_pfn_cache_invalidate_start()
> > which would also have to synchronize, wouldn't it?
>
> Ok, so I had a look at what this code actually does, and it appears to
> be a guest frame number to page frame number cache, managed by
> mmu_notifiers.
>
> IOW, its some software TLB thing (pre HVM Xen support?)
No, it's for any fast path access to KVM guest memory.
KVM guest physical addresses are translated through two stages.
First through the memslots which convert the GPA to a userspace virtual
address (in the VMM process).
Then through the normal process mm into a host physical address.
The GPC is indeed a software TLB thing used to cache the result of that
two-stage translation, hence being invalidated when *either* the
memslots change (easy), *or* the page is invlidated it the process mm
(more fun).
KVM used to *pin* the latter, which was sad for memory hotunplug and
other reasons, and then *still* managed to screw up the caching. (I
think it would just keep using the old page even if userspace mmap'd
something else over it).
I ripped all that out and replaced it with the gfn-to-pfn-cache we have
now.
It's used to allow a fast path where the *common* case is that the GPC
is already valid. Like interrupt delivery to Xen (HVM) guests which
needs to go into the shared-info page. There's always a slow path
fallback which will revalidate the cache (which might involve bringing
the page back in from swap, etc.). For things like steal time /
runstate information, an invalid cache might lead to it not being
updated when the process is scheduled out, but it'll be refreshed when
scheduled back in again before re-entering the guest.
The GPC also has (well, *had*, but we're looking at reintroducing it¹)
a mode where the underlying physical address is actually present in the
VMCS and used directly by the CPU, for guest mode. In that mode,
invalidation requires kicking the vCPU *out* of guest mode. That one
requires sleeping and needs handling even for !PREEMPT_RT:
¹ https://lore.kernel.org/all/20251121111113.456628-3-griffoul@xxxxxxxxx/
> Now, mmu_notifier_invalidate_range_start() has a rather explicit
> might_sleep() in, and while there is an
> mmu_notifier_invalidate_range_start_noblock(), that has an error return,
> and it is clearly specified that if that thing returns non-zero, PTEs
> must not be changed.
>
> With all that, I don't see why we can't block for srcu_synchronize() in
> gfn_to_pfn_cache_invalidate_start().
>
> Now, I've never much looked at mmu_notifiers, but for native, TLBI might
> require sending IPIs to all CPUs, and as such cannot happen in atomic
> sections. I would expect this same to extend to mmu_notifiers. It must
> be possible to sleep in them.
>
> What am I missing?
I think it's the OOM path which invokes it in a non-sleepable context?
Hence the may_block argument, and the 'all vCPUs should have been
stopped already, so perform the request without KVM_REQUEST_WAIT and be
sad' comment in the patch I linked above.
The code in virt/kvm/kvm_main.c already tracks 'may_block' for each
call path that gets here; Fred's patch is just passing it in to the GPC
invalidation.
These caches are *mostly* per-vCPU so we don't suffer too much from the
cache contention on the readlocks.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature