Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

From: Mike Rapoport
Date: Wed May 24 2017 - 11:13:58 EST


On Wed, May 24, 2017 at 04:54:38PM +0200, Vlastimil Babka wrote:
> On 05/24/2017 04:28 PM, Pavel Emelyanov wrote:
> > On 05/24/2017 02:31 PM, Vlastimil Babka wrote:
> >> On 05/24/2017 12:39 PM, Mike Rapoport wrote:
> >>>> Hm so the prctl does:
> >>>>
> >>>> if (arg2)
> >>>> me->mm->def_flags |= VM_NOHUGEPAGE;
> >>>> else
> >>>> me->mm->def_flags &= ~VM_NOHUGEPAGE;
> >>>>
> >>>> That's rather lazy implementation IMHO. Could we change it so the flag
> >>>> is stored elsewhere in the mm, and the code that decides to (not) use
> >>>> THP will check both the per-vma flag and the per-mm flag?
> >>>
> >>> I afraid I don't understand how that can help.
> >>> What we need is an ability to temporarily disable collapse of the pages in
> >>> VMAs that do not have VM_*HUGEPAGE flags set and that after we re-enable
> >>> THP, the vma->vm_flags for those VMAs will remain intact.
> >>
> >> That's what I'm saying - instead of implementing the prctl flag via
> >> mm->def_flags (which gets permanently propagated to newly created vma's
> >> but e.g. doesn't affect already existing ones), it would be setting a
> >> flag somewhere in mm, which khugepaged (and page faults) would check in
> >> addition to the per-vma flags.
> >
> > I do not insist, but this would make existing paths (checking for flags) be
> > 2 times slower -- from now on these would need to check two bits (vma flags
> > and mm flags) which are 100% in different cache lines.
>
> I'd expect you already have mm struct cached during a page fault. And
> THP-eligible page fault is just one per pmd, the overhead should be
> practically zero.
>
> > What Mike is proposing is the way to fine-tune the existing vma flags. This
> > would keep current paths as fast (or slow ;) ) as they are now. All the
> > complexity would go to rare cases when someone needs to turn thp off for a
> > while and then turn it back on.
>
> Yeah but it's extending user-space API for a corner case. We should do
> that only when there's no other option.

With madvise() I'm suggesting we rather add "completeness" to the existing
API, IMHO. We do have API that sets VM_HUGEPAGE and clears VM_NOHUGEPAGE or
vise versa, but we do not have an API that can clear both flags...

And if we would use prctl(), we either change user visible behaviour or we
still need to extend the API and use, say, arg2 to distinguish between the
current behaviour and the new one.

--
Sincerely yours,
Mike.