Re: [PATCH 10/10] mm/hmm: add helpers for driver to safely take the mmap_sem

From: John Hubbard
Date: Wed Feb 20 2019 - 17:40:35 EST


On 2/20/19 2:19 PM, Jerome Glisse wrote:
On Wed, Feb 20, 2019 at 01:59:13PM -0800, John Hubbard wrote:
On 1/29/19 8:54 AM, jglisse@xxxxxxxxxx wrote:
From: JÃrÃme Glisse <jglisse@xxxxxxxxxx>

The device driver context which holds reference to mirror and thus to
core hmm struct might outlive the mm against which it was created. To
avoid every driver to check for that case provide an helper that check
if mm is still alive and take the mmap_sem in read mode if so. If the
mm have been destroy (mmu_notifier release call back did happen) then
we return -EINVAL so that calling code knows that it is trying to do
something against a mm that is no longer valid.

Signed-off-by: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Ralph Campbell <rcampbell@xxxxxxxxxx>
Cc: John Hubbard <jhubbard@xxxxxxxxxx>
---
include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index b3850297352f..4a1454e3efba 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -438,6 +438,50 @@ struct hmm_mirror {
int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
void hmm_mirror_unregister(struct hmm_mirror *mirror);
+/*
+ * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode
+ * @mirror: the HMM mm mirror for which we want to lock the mmap_sem
+ * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken).
+ *
+ * The device driver context which holds reference to mirror and thus to core
+ * hmm struct might outlive the mm against which it was created. To avoid every
+ * driver to check for that case provide an helper that check if mm is still
+ * alive and take the mmap_sem in read mode if so. If the mm have been destroy
+ * (mmu_notifier release call back did happen) then we return -EINVAL so that
+ * calling code knows that it is trying to do something against a mm that is
+ * no longer valid.
+ */

Hi Jerome,

Are you thinking that, throughout the HMM API, there is a problem that
the mm may have gone away, and so driver code needs to be littered with
checks to ensure that mm is non-NULL? If so, why doesn't HMM take a
reference on mm->count?

This solution here cannot work. I think you'd need refcounting in order
to avoid this kind of problem. Just doing a check will always be open to
races (see below).


+static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror)
+{
+ struct mm_struct *mm;
+
+ /* Sanity check ... */
+ if (!mirror || !mirror->hmm)
+ return -EINVAL;
+ /*
+ * Before trying to take the mmap_sem make sure the mm is still
+ * alive as device driver context might outlive the mm lifetime.
+ *
+ * FIXME: should we also check for mm that outlive its owning
+ * task ?
+ */
+ mm = READ_ONCE(mirror->hmm->mm);
+ if (mirror->hmm->dead || !mm)
+ return -EINVAL;
+

Nothing really prevents mirror->hmm->mm from changing to NULL right here.

This is really just to catch driver mistake, if driver does not call
hmm_mirror_unregister() then the !mm will never be true ie the
mirror->hmm->mm can not go NULL until the last reference to hmm_mirror
is gone.

In that case, then this again seems unnecessary, and in fact undesirable.
If the driver code has a bug, then let's let the backtrace from a NULL
dereference just happen, loud and clear.

This patch, at best, hides bugs. And it adds code that should simply be
unnecessary, so I don't like it. :) Let's make it go away.



+ down_read(&mm->mmap_sem);
+ return 0;
+}
+

...maybe better to just drop this patch from the series, until we see a
pattern of uses in the calling code.

It use by nouveau now.

Maybe you'd have to remove that use case in a couple steps, depending on the
order that patches are going in.


thanks,
--
John Hubbard
NVIDIA