Re: kvm+nouveau induced lockdep gripe

From: Mike Galbraith
Date: Mon Oct 26 2020 - 15:16:07 EST


On Mon, 2020-10-26 at 18:31 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-10-24 13:00:00 [+0800], Hillf Danton wrote:
> >
> > Hmm...curious how that word went into your mind. And when?
> > > [ 30.457363]
> > > other info that might help us debug this:
> > > [ 30.457369] Possible unsafe locking scenario:
> > >
> > > [ 30.457375] CPU0
> > > [ 30.457378] ----
> > > [ 30.457381] lock(&mgr->vm_lock);
> > > [ 30.457386] <Interrupt>
> > > [ 30.457389] lock(&mgr->vm_lock);
> > > [ 30.457394]
> > > *** DEADLOCK ***
> > >
> > > <snips 999 lockdep lines and zillion ATOMIC_SLEEP gripes>
>
> The backtrace contained the "normal" vm_lock. What should follow is the
> backtrace of the in-softirq usage.
>
> >
> > Dunno if blocking softint is a right cure.
> >
> > --- a/drivers/gpu/drm/drm_vma_manager.c
> > +++ b/drivers/gpu/drm/drm_vma_manager.c
> > @@ -229,6 +229,7 @@ EXPORT_SYMBOL(drm_vma_offset_add);
> > void drm_vma_offset_remove(struct drm_vma_offset_manager *mgr,
> > struct drm_vma_offset_node *node)
> > {
> > + local_bh_disable();
>
> There is write_lock_bh(). However changing only one will produce the
> same backtrace somewhere else unless all other users already run BH
> disabled region.

Since there doesn't _seems_ to be a genuine deadlock lurking, I just
asked lockdep to please not log the annoying initialization time
chain.

--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
@@ -116,7 +116,17 @@ nvkm_pci_oneinit(struct nvkm_subdev *sub
return ret;
}

+ /*
+ * Scheduler code taking cpuset_rwsem during irq thread initialization sets
+ * up a cpuset_rwsem vs mm->mmap_lock circular dependency gripe upon later
+ * cpuset usage. It's harmless, tell lockdep there's nothing to see here.
+ */
+ if (force_irqthreads)
+ lockdep_off();
ret = request_irq(pdev->irq, nvkm_pci_intr, IRQF_SHARED, "nvkm", pci);
+ if (force_irqthreads)
+ lockdep_on();
+
if (ret)
return ret;