Re: [PATCH 10/10] mm/hmm: add helpers for driver to safely take the mmap_sem

From: John Hubbard
Date: Wed Feb 20 2019 - 16:59:28 EST


On 1/29/19 8:54 AM, jglisse@xxxxxxxxxx wrote:
From: JÃrÃme Glisse <jglisse@xxxxxxxxxx>

The device driver context which holds reference to mirror and thus to
core hmm struct might outlive the mm against which it was created. To
avoid every driver to check for that case provide an helper that check
if mm is still alive and take the mmap_sem in read mode if so. If the
mm have been destroy (mmu_notifier release call back did happen) then
we return -EINVAL so that calling code knows that it is trying to do
something against a mm that is no longer valid.

Signed-off-by: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Ralph Campbell <rcampbell@xxxxxxxxxx>
Cc: John Hubbard <jhubbard@xxxxxxxxxx>
---
include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index b3850297352f..4a1454e3efba 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -438,6 +438,50 @@ struct hmm_mirror {
int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
void hmm_mirror_unregister(struct hmm_mirror *mirror);
+/*
+ * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
+ *
+ * The device driver context which holds reference to mirror and thus to core
+ * hmm struct might outlive the mm against which it was created. To avoid every
+ * driver to check for that case provide an helper that check if mm is still
+ * alive and take the mmap_sem in read mode if so. If the mm have been destroy
+ * (mmu_notifier release call back did happen) then we return -EINVAL so that
+ * calling code knows that it is trying to do something against a mm that is
+ * no longer valid.
+ */

Hi Jerome,

Are you thinking that, throughout the HMM API, there is a problem that
the mm may have gone away, and so driver code needs to be littered with
checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
reference on mm->count?

This solution here cannot work. I think you'd need refcounting in order
to avoid this kind of problem. Just doing a check will always be open to
races (see below).


+static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
+{
+ struct mm_struct *mm;
+
+ /* Sanity check ... */
+ if (!mirror || !mirror->hmm)
+ return -EINVAL;
+ /*
+ * Before trying to take the mmap_sem make sure the mm is still
+ * alive as device driver context might outlive the mm lifetime.
+ *
+ * FIXME: should we also check for mm that outlive its owning
+ * task ?
+ */
+ mm = READ_ONCE(mirror->hmm->mm);
+ if (mirror->hmm->dead || !mm)
+ return -EINVAL;
+

Nothing really prevents mirror->hmm->mm from changing to NULL right here.

+ down_read(&mm->mmap_sem);
+ return 0;
+}
+

...maybe better to just drop this patch from the series, until we see a
pattern of uses in the calling code.

thanks,
--
John Hubbard
NVIDIA