Re: [PATCH 02/36] mmu_notifier: keep track of active invalidation ranges v3

From: John Hubbard
Date: Tue Jun 02 2015 - 05:32:54 EST


On Thu, 21 May 2015, j.glisse@xxxxxxxxx wrote:

> From: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
>
> The mmu_notifier_invalidate_range_start() and mmu_notifier_invalidate_range_end()
> can be considered as forming an "atomic" section for the cpu page table update
> point of view. Between this two function the cpu page table content is unreliable
> for the address range being invalidated.
>
> Current user such as kvm need to know when they can trust the content of the cpu
> page table. This becomes even more important to new users of the mmu_notifier
> api (such as HMM or ODP).
>
> This patch use a structure define at all call site to invalidate_range_start()
> that is added to a list for the duration of the invalidation. It adds two new
> helpers to allow querying if a range is being invalidated or to wait for a range
> to become valid.
>
> For proper synchronization, user must block new range invalidation from inside
> there invalidate_range_start() callback, before calling the helper functions.
> Otherwise there is no garanty that a new range invalidation will not be added
> after the call to the helper function to query for existing range.

Hi Jerome,

Most of this information will make nice block comments for the new helper
routines. I can help tighten up the writing slightly, but first:

Question: in hmm.c's hmm_notifier_invalidate function (looking at the
entire patchset, for a moment), I don't see any blocking of new range
invalidations, even though you point out, above, that this is required. Am
I missing it, and if so, where should I be looking instead?

>
> Changed since v1:
> - Fix a possible deadlock in mmu_notifier_range_wait_valid()
>
> Changed since v2:
> - Add the range to invalid range list before calling ->range_start().
> - Del the range from invalid range list after calling ->range_end().
> - Remove useless list initialization.
>
> Signed-off-by: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
> Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>
> Reviewed-by: Haggai Eran <haggaie@xxxxxxxxxxxx>
> ---
> drivers/gpu/drm/i915/i915_gem_userptr.c | 9 ++--
> drivers/gpu/drm/radeon/radeon_mn.c | 14 +++---
> drivers/infiniband/core/umem_odp.c | 16 +++----
> drivers/misc/sgi-gru/grutlbpurge.c | 15 +++----
> drivers/xen/gntdev.c | 15 ++++---
> fs/proc/task_mmu.c | 11 +++--
> include/linux/mmu_notifier.h | 55 ++++++++++++-----------
> kernel/events/uprobes.c | 13 +++---
> mm/huge_memory.c | 78 ++++++++++++++------------------
> mm/hugetlb.c | 55 ++++++++++++-----------
> mm/ksm.c | 28 +++++-------
> mm/madvise.c | 20 ++++-----
> mm/memory.c | 72 +++++++++++++++++-------------
> mm/migrate.c | 36 +++++++--------
> mm/mmu_notifier.c | 79 ++++++++++++++++++++++++++++-----
> mm/mprotect.c | 18 ++++----
> mm/mremap.c | 14 +++---
> virt/kvm/kvm_main.c | 10 ++---
> 18 files changed, 302 insertions(+), 256 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 452e9b1..80fe72a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -131,16 +131,15 @@ restart:
>
> static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> struct mm_struct *mm,
> - unsigned long start,
> - unsigned long end,
> - enum mmu_event event)
> + const struct mmu_notifier_range *range)
> {
> struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
> struct interval_tree_node *it = NULL;
> - unsigned long next = start;
> + unsigned long next = range->start;
> unsigned long serial = 0;
> + /* interval ranges are inclusive, but invalidate range is exclusive */
> + unsigned long end = range->end - 1, start = range->start;


A *very* minor point, but doing it that way messes up the scope of the
comment. Something more like this might be cleaner:

unsigned long start = range->start;
unsigned long next = start;
unsigned long serial = 0;
/* interval ranges are inclusive, but invalidate range is exclusive */
unsigned long end = range->end - 1;


[...]

> - enum mmu_event event)
> + struct mmu_notifier_range *range)
>
> {
> struct mmu_notifier *mn;
> int id;
>
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + list_add_tail(&range->list, &mm->mmu_notifier_mm->ranges);
> + mm->mmu_notifier_mm->nranges++;


Is this missing a call to wake_up(&mm->mmu_notifier_mm->wait_queue)? If
not, then it would be helpful to explain why that's only required for
nranges--, and not for the nranges++ case. The helper routine is merely
waiting for nranges to *change*, not looking for greater than or less
than.


> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> if (mn->ops->invalidate_range_start)
> - mn->ops->invalidate_range_start(mn, mm, start,
> - end, event);
> + mn->ops->invalidate_range_start(mn, mm, range);
> }
> srcu_read_unlock(&srcu, id);
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
>
> void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> - unsigned long start,
> - unsigned long end,
> - enum mmu_event event)
> + struct mmu_notifier_range *range)
> {
> struct mmu_notifier *mn;
> int id;
> @@ -211,12 +211,23 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> * (besides the pointer check).
> */
> if (mn->ops->invalidate_range)
> - mn->ops->invalidate_range(mn, mm, start, end);
> + mn->ops->invalidate_range(mn, mm,
> + range->start, range->end);
> if (mn->ops->invalidate_range_end)
> - mn->ops->invalidate_range_end(mn, mm, start,
> - end, event);
> + mn->ops->invalidate_range_end(mn, mm, range);
> }
> srcu_read_unlock(&srcu, id);
> +
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + list_del_init(&range->list);
> + mm->mmu_notifier_mm->nranges--;
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +
> + /*
> + * Wakeup after callback so they can do their job before any of the
> + * waiters resume.
> + */
> + wake_up(&mm->mmu_notifier_mm->wait_queue);
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
>
> @@ -235,6 +246,49 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> }
> EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
>


We definitely want to put a little documentation here.


> +static bool mmu_notifier_range_is_valid_locked(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)


This routine is named "_range_is_valid_", but it takes in an implicit
range (start, end), and also a list of ranges (buried in mm), and so it's
a little confusing. I'd like to consider *maybe* changing either the name,
or the args (range* instead of start, end?), or something.

Could you please say a few words about the intent of this routine, to get
us started there?


> +{
> + struct mmu_notifier_range *range;
> +
> + list_for_each_entry(range, &mm->mmu_notifier_mm->ranges, list) {
> + if (!(range->end <= start || range->start >= end))
> + return false;


This has a lot of negatives in it, if you count the innermost "not in
range" expression. It can be simplified to this:

if(range->end > start && range->start < end)
return false;


> + }
> + return true;
> +}
> +
> +bool mmu_notifier_range_is_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + bool valid;
> +
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + valid = mmu_notifier_range_is_valid_locked(mm, start, end);
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + return valid;
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_is_valid);
> +
> +void mmu_notifier_range_wait_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + while (!mmu_notifier_range_is_valid_locked(mm, start, end)) {
> + int nranges = mm->mmu_notifier_mm->nranges;
> +
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + wait_event(mm->mmu_notifier_mm->wait_queue,
> + nranges != mm->mmu_notifier_mm->nranges);
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + }
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_wait_valid);
> +
> static int do_mmu_notifier_register(struct mmu_notifier *mn,
> struct mm_struct *mm,
> int take_mmap_sem)
> @@ -264,6 +318,9 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
> if (!mm_has_notifiers(mm)) {

[...]

That's all I could see to mention for this one, thanks,

john h