Re: [PATCH v5 7/7] mm: Remove the now-unnecessary mmget_still_valid() hack

From: Jann Horn
Date: Mon Aug 31 2020 - 05:59:22 EST


On Mon, Aug 31, 2020 at 8:07 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> On Thu, 27 Aug 2020, Jann Horn wrote:
>
> > The preceding patches have ensured that core dumping properly takes the
> > mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
> > its users.
>
> Hi Jann, while the only tears to be shed over losing mmget_still_valid()
> will be tears of joy, I think you need to explain why you believe it's
> safe to remove the instance in mm/khugepaged.c: which you'll have found
> I moved just recently, to cover an extra case (sorry for not Cc'ing you).
>
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
> >
> > static inline int khugepaged_test_exit(struct mm_struct *mm)
> > {
> > - return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
> > + return atomic_read(&mm->mm_users) == 0;
> > }
> >
> > static bool hugepage_vma_check(struct vm_area_struct *vma,
>
> The movement (which you have correctly followed) was in
> bbe98f9cadff ("khugepaged: khugepaged_test_exit() check mmget_still_valid()")
> but the "pmd .. physical page 0" issue is explained better in its parent
> 18e77600f7a1 ("khugepaged: retract_page_tables() remember to test exit")
>
> I think your core dumping is still reading the page tables without
> holding mmap_lock

Where? get_dump_page() takes mmap_lock now:
<https://lore.kernel.org/lkml/20200827114932.3572699-7-jannh@xxxxxxxxxx/>

I don't think there should be any paths into __get_user_pages() left
that don't hold the mmap_lock. Actually, we should probably try
sticking mmap_assert_locked() in there now as a follow-up?

> so still vulnerable to that extra issue. It won't
> be as satisfying as removing all traces of mmget_still_valid(), but
> right now I think you should add an mm->core_state check there instead.
>
> (I do have a better solution in use, but it's a much larger patch, that
> will take a lot more effort to get in: checks in pte_offset_map_lock(),
> perhaps returning NULL when pmd is transitioning, requiring retry.)

Just to clarify: This is an issue only between GUP's software page
table walks when running without mmap_lock and concurrent page table
modifications from hugepage code, correct? Hardware page table walks
and get_user_pages_fast() are fine because they properly load PTEs
atomically and are written to assume that the page tables can change
arbitrarily under them, and the only guarantee is that disabling
interrupts ensures that pages referenced by PTEs can't be freed,
right?

> Or maybe it's me who has missed what you're doing instead.
>
> Hugh