Re: [BUG] nouveau lockdep splat

From: Jason Gunthorpe
Date: Thu Jan 09 2020 - 09:53:15 EST


On Wed, Jan 08, 2020 at 05:16:40PM -0800, Ralph Campbell wrote:
> I hit this while testing HMM with nouveau on linux-5.5-rc5.
> I'm not a lockdep expert but my understanding of this is that an
> invalidation callback could potentially call kzalloc(GFP_KERNEL)
> which could cause another invalidation and recursively deadlock.
> Looking at the drivers/gpu/drm/nouveau/nvkm/ layer, I do see a
> number of places where GFP_KERNEL is used for allocations and I
> don't see an easy way to avoid that.

Not quite..

Any lock held by the invalidation callback becomes a lock where
GFP_KERNEL cannot be used within it's critical region.

Ie we can't have a notifier callback block on a lock which is held by
another thread which is blocked on GFP_KERNEL as we now risk
deadlocking on other mm locks if that allocation triggers reclaim.

AFAIK there is no fix from the core side. The driver must respect this
and be organized to deal with it. Daniel fixed the intel driver
already, I fixed RDMA recently, the other drivers must also be fixed.

Some choices
- Split up the lock held by the notifier callback so it doesn't need
to cover allocations
- Use GFP_ATOMIC for allocations
- Speculatively do allocations before obtaining the lock and free if
they were not needed.

I suppose it will be some troublbe for nouveau, but it must be done
there..

Jason