Re: [PATCH 01/62] kvm: Make pi_enable_wakeup_handler() easier to analyze

From: Sean Christopherson

Date: Thu Feb 26 2026 - 13:18:15 EST

On Tue, Feb 24, 2026, Bart Van Assche wrote:
> On 2/24/26 10:20 AM, Sean Christopherson wrote:
> > For the scope, please use:
> >
> > KVM: VMX:
> >
> > On Mon, Feb 23, 2026, Bart Van Assche wrote:
> > > The Clang thread-safety analyzer does not support comparing expressions
> > > that use per_cpu(). Hence introduce a new local variable to capture the
> > > address of a per-cpu spinlock. This patch prepares for enabling the
> > > Clang thread-safety analyzer.
> > >
> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Cc: kvm@xxxxxxxxxxxxxxx
> > > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> > > ---
> > > arch/x86/kvm/vmx/posted_intr.c | 7 ++++---
> > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> > > index 4a6d9a17da23..f8711b7b85a8 100644
> > > --- a/arch/x86/kvm/vmx/posted_intr.c
> > > +++ b/arch/x86/kvm/vmx/posted_intr.c
> > > @@ -164,6 +164,7 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
> > > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > struct vcpu_vt *vt = to_vt(vcpu);
> > > struct pi_desc old, new;
> > > + raw_spinlock_t *wakeup_lock;
> > > lockdep_assert_irqs_disabled();
> > > @@ -179,11 +180,11 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
> > > * entirety of the sched_out critical section, i.e. the wakeup handler
> > > * can't run while the scheduler locks are held.
> > > */
> > > - raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu),
> > > - PI_LOCK_SCHED_OUT);
> > > + wakeup_lock = &per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu);
> >
> > Addressing this piecemeal doesn't seem maintainable in the long term. The odds
> > of unintentionally regressing the coverage with a cleanup are rather high. Or
> > we'll end up with confused and/or grumpy developers because they're required to
> > write code in a very specific way because of what are effectively shortcomings
> > in the compiler.
>
> I think it's worth mentioning that the number of patches similar to the
> above is small. If I remember correctly, I only encountered two similar
> cases in the entire kernel tree.

Yeah, it's definitely not a deal-breaker to work around this in KVM, especially
if this is one of the few things blocking -Wthread-safety.

> Regarding why the above patch is necessary, I don't think that it is
> fair to blame the compiler in this case. The macros that implement
> per_cpu() make it impossible for the compiler to conclude that the
> pointers passed to the raw_spin_lock_nested() and raw_spin_unlock()
> calls are identical:

Well rats, that pretty much makes it infeasible to solve the underlying problem.

> /*
> * Add an offset to a pointer. Use RELOC_HIDE() to prevent the compiler
> * from making incorrect assumptions about the pointer value.
> */
> #define SHIFT_PERCPU_PTR(__p, __offset) \
> RELOC_HIDE(PERCPU_PTR(__p), (__offset))
>
> #define RELOC_HIDE(ptr, off) \
> ({ \
> unsigned long __ptr; \
> __asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
> (typeof(ptr)) (__ptr + (off)); \
> })
>
> By the way, the above patch is not the only possible solution for
> addressing the thread-safety warning Clang reports for this function.
> Another possibility is adding __no_context_analysis to the function
> definition. Is the latter perhaps what you prefer?

Hmm, I'd prefer to keep the analysis, even though it's a bit of a pain. We already
went through quite some effort to preserve lockdep for this lock; compared to that,
forcing use of local variables is hardly anything.

My only concern is lack of enforcement and documentation. I fiddled with a bunch
of ideas, but mostly of them flamed out because of the aformentioned lockdep
shenanigans. E.g. forcing use of guard() or scoped_guard() doesn't Just Work.

The best idea I came up with is to rename the global variable to something scary,
and then define a CLASS() so that it's syntactically all but impossible to feed
the the result of per_cpu() directly into lock() or unlock().

What's your timeline for enabling -Wthread-safety? E.g. are you trying to land
it in 7.1? 7.2+? I'd be happy to formally post the below and get it landed in
the N-1 kernel (assuming Paolo is also comfortable landing the patch in 7.0 if
you're targeting 7.1).

---
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Thu, 26 Feb 2026 07:21:52 -0800
Subject: [PATCH] KVM: VMX: Force wakeup_vcpus_on_cpu_lock to be captured in
local variable

Wrap wakeup_vcpus_on_cpu_lock in a CLASS() and append "do_not_use" to the
per-CPU symbol to effectively force lock()+unlock() paths to capture the
per-CPU lock in a local variable. Clang's thread-safety analyzer doesn't
support comparing lock() vs. unlock() expressions that use separate
per_cpu() invocations (-Wthread-safety generates false-positves), as the
kernel's per_cpu() implementation deliberately hides the resolved address
from the compiler, specifically to prevent the compiler from reasoning
about the symbol. I.e. per_cpu() is a victim of its own success.

Link: https://lore.kernel.org/all/a2ebde260608230500o3407b108hc03debb9da6e62c@xxxxxxxxxxxxxx
Link: https://news.ycombinator.com/item?id=18050983
Suggested-by: Bart Van Assche <bvanassche@xxxxxxx>
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/vmx/posted_intr.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 4a6d9a17da23..e08faaeab12f 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -31,7 +31,21 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_cpu);
* CPU. IRQs must be disabled when taking this lock, otherwise deadlock will
* occur if a wakeup IRQ arrives and attempts to acquire the lock.
*/
-static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
+static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock__do_not_touch);
+
+/*
+ * Route accesses to the lock through a CLASS() to effectively force users to
+ * capture the lock in a local variable. The kernel's per_cpu() implementation
+ * deliberately obfuscates the address of the data to prevent the compiler from
+ * making incorrect assumptions about the symbol. However, hiding the address
+ * triggers false-positive thread-safety warnings if lock() vs. unlock() are
+ * called with different per_cpu() invocations, because the compiler can't tell
+ * its the same lock under the hood.
+ */
+DEFINE_CLASS(pi_wakeup_vcpus_lock, raw_spinlock_t *,
+ lockdep_assert_not_held(_T),
+ &per_cpu(wakeup_vcpus_on_cpu_lock__do_not_touch, cpu),
+ int cpu);

#define PI_LOCK_SCHED_OUT SINGLE_DEPTH_NESTING

@@ -90,7 +104,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
* current pCPU if the task was migrated.
*/
if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
- raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu);
+ CLASS(pi_wakeup_vcpus_lock, spinlock)(cpu);

/*
* In addition to taking the wakeup lock for the regular/IRQ
@@ -165,6 +179,8 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
struct vcpu_vt *vt = to_vt(vcpu);
struct pi_desc old, new;

+ CLASS(pi_wakeup_vcpus_lock, spinlock)(vcpu->cpu);
+
lockdep_assert_irqs_disabled();

/*
@@ -179,11 +195,10 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
* entirety of the sched_out critical section, i.e. the wakeup handler
* can't run while the scheduler locks are held.
*/
- raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu),
- PI_LOCK_SCHED_OUT);
+ raw_spin_lock_nested(spinlock, PI_LOCK_SCHED_OUT);
list_add_tail(&vt->pi_wakeup_list,
&per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
- raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
+ raw_spin_unlock(spinlock);

WARN(pi_test_sn(pi_desc), "PI descriptor SN field set before blocking");

@@ -254,9 +269,10 @@ void pi_wakeup_handler(void)
{
int cpu = smp_processor_id();
struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
- raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
struct vcpu_vt *vt;

+ CLASS(pi_wakeup_vcpus_lock, spinlock)(cpu);
+
raw_spin_lock(spinlock);
list_for_each_entry(vt, wakeup_list, pi_wakeup_list) {

@@ -269,7 +285,7 @@ void pi_wakeup_handler(void)
void __init pi_init_cpu(int cpu)
{
INIT_LIST_HEAD(&per_cpu(wakeup_vcpus_on_cpu, cpu));
- raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+ raw_spin_lock_init(&per_cpu(wakeup_vcpus_on_cpu_lock__do_not_touch, cpu));
}

void pi_apicv_pre_state_restore(struct kvm_vcpu *vcpu)

base-commit: 183bb0ce8c77b0fd1fb25874112bc8751a461e49
--