Re: [PATCH v2 01/20] locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()

From: Paolo Bonzini

Date: Sat May 30 2026 - 06:27:24 EST

On 5/30/26 02:54, Sean Christopherson wrote:

On Fri, May 29, 2026, Peter Zijlstra wrote:

On Fri, May 29, 2026 at 10:13:35PM +0200, Peter Zijlstra wrote:

It is somewhat possible to do an RT aware read-write spinlock thing, but
it is definitely non-trivial and this would be the only user.

Furthermore, it is fundamentally one of the worst possible lock types.

It really isn't something you *want* to have -- arguably even for !RT.

They scale like ass; per them being a spinlock type, the critical
sections must be short, but this means there is nothing to amortize the
cost of bouncing the shared lock around -- which is the 'saving' grace
of the rwsem.

For short sections the cost of the shared access will dominate. I'm sure
Paul has a bunch of graphs to illustrate this point :-)

Hrm. We might be able to do an SRCU-based implementation. Full RCU would be
too unpredictable on the update side.

Yeah, I think so.

The write side needs kvm->srcu so it would have to be yet another SRCU. I initially thought that sucks for the code that calls kvm_gpc_check(), but maybe not because it simply replaces read_lock/read_unlock.

By using a seqcount for the data, SRCU only needs to be synchronized in gpc_unmap(). So, something like this:

struct kvm_cached_gva_map {
struct kvm_memory_slot *memslot;
void *khva;
unsigned long uhva;
unsigned long idx;
};

/* snapshot is the new check */
bool kvm_gpc_snapshot(struct gfn_to_pfn_cache *gpc, unsigned long len,
struct kvm_cached_gva_map *ret)
{
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
unsigned seq;
u64 generation;
gpa_t gpa;

/* The data is protected by the seqcount, the khva by SRCU. */
ret->idx = srcu_read_lock(&gpc->kvm->gpc_srcu);
/* A seqcount_spinlock_t, but see below - there's a bug! */
if (!raw_seqcount_try_begin(&gpc->sc, seq))
goto out_fail;

if (!gpc->active || !gpc->valid)
goto out_fail;

ret->uhva = gpc->uhva;
ret->khva = gpc->khva;
ret->memslot = gpc->memslot;
gpa = gpc->gpa;
generation = gpc->generation;
if (read_seqcount_retry(&gpc->sc, seq))
goto out_fail;

/*
* Now do all the validity checks on a stable copy.
* If the page was cached from a memslot, make sure the
* memslots have not been re-configured.
*/
if (!kvm_is_error_gpa(gpa) && generation != slots->generation)
goto out_fail;

if (kvm_is_error_hva(ret->uhva))
goto out_fail;

if (!kvm_gpc_is_valid_len(gpa, ret->uhva, len))
goto out_fail;

return true;

out_fail:
srcu_read_unlock(&gpc->kvm->gpc_srcu, ret->idx);
return false;
}

static inline
void kvm_gpc_snapshot_release(struct gfn_to_pfn_cache *gpc,
struct kvm_cached_gva_map *ret)
{
srcu_read_unlock(&gpc->kvm->gpc_srcu, ret->idx);
}

Introducing snapshot as a replacement for check (while still leaving it to the caller to take the rwlock) can even be a separate patch.

There is a bug in raw_seqcount_try_begin() though. It must not do the lock/unlock dance or it won't be usable in atomic context. See attached patch.

The big hiccup is when KVM needs to reclaim memory in response to mmu_notifier
events from the OOM killer, which don't allow sleeping. I'll stare at this a
bit next week. I'm not exactly thrilled about the idea of ripping apart this
code, but there's a known performance bottleneck with mmu_notifier events,
maybe we can kill two birds with one stone.

There was a pending patch to allow the MMU notifiers to disable themselves while still in sleepable context.

https://lore.kernel.org/kvm/20260429222548.25475-1-shaikhkamal2012@xxxxxxxxx/

It really needs to be split in multiple parts, and the allocation needs to be removed. Looking at it.

Otherwise, calling synchronize_srcu() from gpc_unmap(), which is within the kvm->srcu read side, is also a bit unpredictable. But the read sides are super short so you can make it expedited.

Or can you just skip the synchronize_srcu() from OOM via check_stable_address_space()? Maybe pretend I didn't suggest that.

PaoloFrom f2c604a94cdea9bb96bb21c8a29288eae6cbac67 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Sat, 30 May 2026 10:54:36 +0200
Subject: [PATCH] seqcount: allow raw_seqcount_try_begin in atomic context

When raw_seqcount_try_begin() is used, the typical action when
the seqcount is busy is to take the same lock that the writer uses.
Another possibility, found in kernel/events/uprobes.c, is to delay
the work to a safe point using RCU. Either way there is no need
to guarantee progress of the writer under CONFIG_PREEMPT_RT,
because the caller is not going to call raw_seqcount_try_begin() in a loop.

However, raw_seqcount_try_begin() currently uses seqprop_sequence()
via raw_read_seqcount(), and therefore does a lock/unlock of the
write-side lock on PREEMPT_RT kernel. This prevents it from being
used in atomic context. Use __seqprop_sequence() instead of
raw_read_seqcount() to allow it.

Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
include/linux/seqlock.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5a40252b8334..9adbf5d9667c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -338,7 +338,10 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
*/
#define raw_seqcount_try_begin(s, start) \
({ \
- start = raw_read_seqcount(s); \
+ start = __seqprop_sequence(s); \
+ \
+ if (!(start & 1)) \
+ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
!(start & 1); \
})

--
2.54.0