Re: [PATCH v3] KVM: VMX: fix lockdep warning on posted intr wakeup

From: Yan Zhao
Date: Thu Mar 30 2023 - 20:31:54 EST


On Thu, Mar 30, 2023 at 11:14:27AM -0700, Sean Christopherson wrote:
> On Thu, Mar 30, 2023, Yan Zhao wrote:
> > On Wed, Mar 29, 2023 at 01:51:23PM +0200, Paolo Bonzini wrote:
> > > On 3/29/23 03:53, Yan Zhao wrote:
> > > > Yes, there's no actual deadlock currently.
> > > >
> > > > But without fixing this issue, debug_locks will be set to false along
> > > > with below messages printed. Then lockdep will be turned off and any
> > > > other lock detections like lockdep_assert_held()... will not print
> > > > warning even when it's obviously violated.
> > >
> > > Can you use lockdep subclasses, giving 0 to the sched_in path and 1 to the
> > > sched_out path?
> >
> > Yes, thanks for the suggestion!
> > This can avoid this warning of "possible circular locking dependency".
> >
> > I tried it like this:
> > - in sched_out path:
> > raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
> >
> > - in irq and sched_in paths:
> > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> >
> > But I have a concern:
> > If sched_in path removes vcpu A from wakeup list of its previous pcpu A,
> > and at the mean time, sched_out path adds vcpu B to the wakeup list of
> > pcpu A, the sched_in and sched_out paths should race for the same
> > subclass of lock.
> > But if sched_in path only holds subclass 0, and sched_out path holds
> > subclass 1, then lockdep would not warn of "possible circular locking
> > dependency" if someone made a change as below in sched_in path.
> >
> > if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > list_del(&vmx->pi_wakeup_list);
> > + raw_spin_lock(&current->pi_lock);
> > + raw_spin_unlock(&current->pi_lock);
> > raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > }
> >
> > While with v3 of this patch (sched_in path holds both out_lock and in_lock),
> > lockdep is still able to warn about this issue.
>
> Couldn't we just add a manual assertion? That'd also be a good location for a
> comment to document all of this, and to clarify that current->pi_lock is a
> completely different lock that has nothing to do with posted interrupts.
>
> It's not foolproof, but any patches that substantially touch this code need a
> ton of scrutiny as the scheduling interactions are gnarly, i.e. IMO a deadlock
> bug sneaking in is highly unlikely.
>
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 94c38bea60e7..19325a10e42f 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -90,6 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> */
> if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> + lockdep_assert_not_held(&current->pi_lock);
> list_del(&vmx->pi_wakeup_list);
> raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> }
Hmm...No. It's not about "current->pi_lock" cannot be held, it's about
the lock ordering.
In sched_out thread, the lock ordering is
"current->pi_lock" --> "rq->__lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
then in sched_in thread, if the lock ordering is
"per_cpu(wakeup_vcpus_on_cpu_lock, cpu)" --> "current->pi_lock",
circular locking dependency will happen.
while if the lock ordering in sched_in thread is
"current->pi_lock" --> "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)",
it's fine!

If sched_out thread and sched_in thread actually should hold the same
subclass of lock, we can't fool the lockdep just to let it shut up.
And, we may not be able to list or document out all potential locks that cannot
be held inside the "per_cpu(wakeup_vcpus_on_cpu_lock, cpu)", right?

BTW, could you tell me why you think v3 complicates KVM's functionality?
It just splits a single lock into two sub locks, and let irq path only
takes in_lock, sched_out path only takes out_lock, while sched_in path takes
both in_lock and out_lock.
IMHO, it does not make any functional change to KVM code.
Maybe it's because the commit message is not well written and gave people a wrong
impression that the logic changes a lot?


Thanks
Yan