Re: [PATCH 1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

From: Jason Gunthorpe
Date: Wed Jun 24 2020 - 08:39:15 EST


On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > reclaim, the calling context is unknown and may include attempts to
> > > unmap one page of a dma object while attempting to allocate more pages
> > > for that object. Pass the information along that we are inside an
> > > opportunistic unmap that can allow that page to remain referenced and
> > > mapped, and let the callback opt in to avoiding a recursive wait.
> >
> > i915 should already not be holding locks shared with the notifiers
> > across allocations that can trigger reclaim. This is already required
> > to use notifiers correctly anyhow - why do we need something in the
> > notifiers?
>
> for (n = 0; n < num_pages; n++)
> pin_user_page()
>
> may call try_to_unmap_page from the lru shrinker for [0, n-1].

Yes, of course you can't hold any locks that intersect with notifiers
across pin_user_page()/get_user_page()

It has always been that way.

I consolidated all this tricky locking into interval notifiers, maybe
updating i915 to use them will give it a solution. I looked at it
once, it was straightforward enough until it got to all the #ifdefery

> We're in the middle of allocating the object, how are we best to untangle
> that?

I don't know anything about i915, but this is clearly i915 not using
notifiers properly, it needs proper fixing, not hacking up notifiers.

Jason