Re: [PATCH v5 1/2] mm/mmu_notifier: make interval notifier updates safe

From: Jason Gunthorpe
Date: Thu Jan 09 2020 - 18:26:39 EST


On Thu, Jan 09, 2020 at 02:01:21PM -0800, Ralph Campbell wrote:

> > I'd write it more like
> >
> > if (mni->updated_start == mni->updated_end)
> > insert
> > else
> > remove
>
> OK, but I'm using updated_end == 0, not updated_start, and the end can't be zero.

Tricky..

> > ie an empty interval can't get a notification so it should be removed
> > from the tree.
> >
> > I also like the name 'updated' better than deferred, it is a bit
> > clearer..
>
> OK.
>
> > Adding release should it's own patch.
>
> The release callback is associated with mmu_interval_notifier_put()
> (i.e., async remove). Otherwise, there is no way to know when the
> interval can be freed.

Okay, but this patch is just trying to add update?

> > So why do we need this? You can't call hmm_range_fault from a
> > notifier. You just can't.
> >
> > So there should be no reason to create an interval from the notifier,
> > do it from where you call hmm_range_fault, and it must be safe to
> > obtain the mmap_sem from that thread.
>
> I was thinking of the case where munmap() creates a hole in the interval.
> The invalidate callback would need to update the interval to cover the
> left side of the remaining interval and an insert to cover the right
> side. Otherwise, the HW invalidation has to be extended to cover the
> right side and rely on a fault to re-establish the right side interval.

This is very tricky because this algorithm can only work correctly if
done atomically as a batch entirely under the spinlock. Forcing it
into the defered list while holding the lock is the only way to do
something like that sensibly..

So 'update' is not some generic API you can call, it can only be done
while the interval tree is locked for reading. Thus 'safe' is probably
the wrong name, it is actually 'interval tree locked for read'

At the minimum this needs to be comprehensively documented and we need
a lockdep style assertion that we are locked when doing it..

And if we are defining things like that then it might as well be
expressed as a remove/insert/insert batch operation rather than
a somewhat confusing update.

> Now the plan for v6 is to leave mmu_interval_notifier_remove() unchanged,
> add mmu_interval_notifier_put() for async/safe removal and make 'update'
> be asynchronous only and, as you say, rely on mmu_interval_read_begin()
> to be sure all delayed add/remove/updates are complete.

Hm, we can see what injecting reference counts would look like.

> I'm also planning to add a mmu_interval_notifier_find() so that nouveau
> and the self tests don't need to create a duplicate interval range tree
> to track the intervals that they have registered. There isn't an existing
> structure that the struct mmu_interval_notifier can just be added to so
> it ends up being a separately allocated structure and would need to be
> stored in some sort of table so I thought why not just use the itree.

Okay, but for locking reasons find is also a little tricky. I suppose
find can obtain the read side lock on the interval tree and then the
caller would have to find_unlock once it has refcounted or finished
accessing the object. Much like how the invalidate callback is locked.

> This is all useful feedback. I am working on v6 which addresses your concerns
> and updates nouveau to use the new API. I'm somewhat sidetracked by the lockdep
> issue I posted about nouveau calling kmalloc(GFP_KERNEL) from the invalidation
> callback so it may take me awhile to sort that out.
> Since we are at -rc5, I'm guessing this won't have enough soak time to make 5.6.

Yes, sorry for the delay, lots of travel and a mountain of emails. I
am almost caught up now. But you can post it at least.

Jason