Re: [PATCH] KVM: VMX: fix lockdep warning on posted intr wakeup
From: Yan Zhao
Date: Mon Mar 13 2023 - 04:39:17 EST
On Fri, Mar 10, 2023 at 09:00:00AM -0800, Sean Christopherson wrote:
> On Fri, Mar 10, 2023, Yan Zhao wrote:
> > Use rcu list to break the possible circular locking dependency reported
> > by lockdep.
> >
> > path 1, ``sysvec_kvm_posted_intr_wakeup_ipi()`` --> ``pi_wakeup_handler()``
> > --> ``kvm_vcpu_wake_up()`` --> ``try_to_wake_up()``,
> > the lock sequence is
> > &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock.
>
> Heh, that's an unfortunate naming collision. It took me a bit of staring to
> realize pi_lock is a scheduler lock, not a posted interrupt lock.
me too :)
>
> > path 2, ``schedule()`` --> ``kvm_sched_out()`` --> ``vmx_vcpu_put()`` -->
> > ``vmx_vcpu_pi_put()`` --> ``pi_enable_wakeup_handler()``,
> > the lock sequence is
> > &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu).
> >
> > path 3, ``task_rq_lock()``,
> > the lock sequence is &p->pi_lock --> &rq->__lock
> >
> > lockdep report:
> > Chain exists of:
> > &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu)
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> > lock(&rq->__lock);
> > lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> > lock(&p->pi_lock);
> >
> > *** DEADLOCK ***
>
> I don't think there's a deadlock here. pi_wakeup_handler() is called from IRQ
> context, pi_enable_wakeup_handler() disable IRQs before acquiring
> wakeup_vcpus_on_cpu_lock, and "cpu" in pi_enable_wakeup_handler() is guaranteed
> to be the current CPU, i.e. the same CPU. So CPU0 and CPU1 can't be contending
> for the same wakeup_vcpus_on_cpu_lock in this scenario.
>
> vmx_vcpu_pi_load() does do cross-CPU locking, but finish_task_switch() drops
> rq->__lock before invoking the sched_in notifiers.
Right. Thanks for this analysis!
But the path of pi_wakeup_handler() tells lockdep that the lock ordering
is &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock,
cpu), so the lockdep just complains about it.
>
> > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> > ---
> > arch/x86/kvm/vmx/posted_intr.c | 12 +++++-------
> > 1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> > index 94c38bea60e7..e3ffc45c0a7b 100644
> > --- a/arch/x86/kvm/vmx/posted_intr.c
> > +++ b/arch/x86/kvm/vmx/posted_intr.c
> > @@ -90,7 +90,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> > */
> > if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> > raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> > - list_del(&vmx->pi_wakeup_list);
> > + list_del_rcu(&vmx->pi_wakeup_list);
> > raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
>
> _If_ there is indeed a possible deadlock, there technically needs to be an explicit
> synchonize_rcu() before freeing the vCPU. In practice, there are probably multiple
> synchonize_rcu() calls in the destruction path, not to mention that it would take a
> minor miracle for pi_wakeup_handler() to get stalled long enough to achieve a
> use-after-free.
>
Yes, I neglected it.
Thanks for the quick and detailed review!
I will post v2 to fix it.
Yan