Re: [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4

From: Jerome Glisse
Date: Tue Sep 01 2015 - 10:58:25 EST

Next message: Steve Wise: "RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_destroy_cq"
Previous message: Yury Norov: "[PATCH 2/2] arm64: don't load 32-bit binaries if platform has no aarch32_el0"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Aug 31, 2015 at 08:27:17PM -0700, Mark Hairgrove wrote:
> On Thu, 13 Aug 2015, Jérôme Glisse wrote:

[...]

Will fix syntax.

[...]

> > +/* mmu_notifier_range_wait_valid() - wait for a range to have no conflict with
> > + * active invalidation.
> > + *
> > + * @mm: The mm struct.
> > + * @start: Start address of the range (inclusive).
> > + * @end: End address of the range (exclusive).
> > + *
> > + * This function wait for any active range invalidation that conflict with the
> > + * given range, to end.
> > + *
> > + * Note by the time this function return a new range invalidation that conflict
> > + * might have started. So you need to atomically block new range and query
> > + * again if range is still valid with mmu_notifier_range_is_valid(). So call
> > + * sequence should be :
> > + *
> > + * again:
> > + * mmu_notifier_range_wait_valid()
> > + * // block new invalidation using that lock inside your range_start callback
> > + * lock_block_new_invalidation()
> > + * if (!mmu_notifier_range_is_valid())
> > + * goto again;
> > + * unlock()
>
> I think this example sequence can deadlock so I wouldn't want to encourage
> its use. New invalidation regions are added to the list before the
> range_start callback is invoked.
>
> Thread A Thread B
> ----------------- -----------------
> mmu_notifier_range_wait_valid
> // returns
> __mmu_notifier_invalidate_range_start
> list_add_tail
> lock_block_new_invalidation
> ->invalidate_range_start
> // invalidation blocked in callback
> mmu_notifier_range_is_valid // fails
> goto again
> mmu_notifier_range_wait_valid // deadlock
>
> mmu_notifier_range_wait_valid can't finish until thread B's callback
> returns, but thread B's callback can't return because it's blocked.
>
> I see that HMM in later patches takes the approach of not holding the lock
> when mmu_notifier_range_is_valid returns false. Instead of stalling new
> invalidations it returns -EAGAIN to the caller. While that resolves the
> deadlock, it won't prevent the faulting thread from being starved in the
> pathological case.

The comment here is not clear, what HMM does is what is intended. If
mmu_notifier_range_is_valid() return false then you drop lock and try
again. I am not sure we should care about the starve case as it can
not happen, it would mean that something keeps invalidating over and
over the same range of address space of a process. I do not see how
such thing would happen.

>
> Is it out of the question to build a lock into the mmu notifier API
> directly? It's a little worrisome to me that the complexity for this
> locking is pushed into the callbacks rather than handled in the core.
> Something like this:
>
> mmu_notifier_range_lock(start, end)
> mmu_notifier_range_unlock(start, end)

If a range is about to be invalidated it is better to avoid faulting
in memory in the device as that same memory is about to be invalidated.
This is why i always have invalidation take precedence over device fault.

> If that's not feasible and we have to stick with the current approach,
> then I suggest changing the "valid" name. "valid" doesn't have a clear
> meaning at first glance because the reader doesn't know what would make a
> range "valid." How about "active" instead? Then the names would look
> something like this, assuming the polarity matches their current versions:
>
> mmu_notifier_range_inactive_locked
> mmu_notifier_range_inactive
> mmu_notifier_range_wait_active

Those names are better i will update the patch accordingly.

Thanks for the review,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Steve Wise: "RE: [PATCH] infiniband:cxgb4:Fix incorrect return statement in the function c4iw_destroy_cq"
Previous message: Yury Norov: "[PATCH 2/2] arm64: don't load 32-bit binaries if platform has no aarch32_el0"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]