Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE
From: Michal Hocko
Date: Wed May 31 2017 - 06:24:50 EST
On Wed 31-05-17 12:27:00, Mike Rapoport wrote:
> On Wed, May 31, 2017 at 10:24:14AM +0200, Michal Hocko wrote:
> > On Wed 31-05-17 08:30:08, Vlastimil Babka wrote:
> > > On 05/30/2017 06:06 PM, Andrea Arcangeli wrote:
> > > >
> > > > I'm not sure if it should be considered a bug, the prctl is intended
> > > > to use normally by wrappers so it looks optimal as implemented this
> > > > way: affecting future vmas only, which will all be created after
> > > > execve executed by the wrapper.
> > > >
> > > > What's the point of messing with the prctl so it mangles over the
> > > > wrapper process own vmas before exec? Messing with those vmas is pure
> > > > wasted CPUs for the wrapper use case which is what the prctl was
> > > > created for.
> > > >
> > > > Furthermore there would be the risk a program that uses the prctl not
> > > > as a wrapper and then calls the prctl to clear VM_NOHUGEPAGE from
> > > > def_flags assuming the current kABI. The program could assume those
> > > > vmas that were instantiated before disabling the prctl are still with
> > > > VM_NOHUGEPAGE set (they would not after the change you propose).
> > > >
> > > > Adding a scan of all vmas to PR_SET_THP_DISABLE to clear VM_NOHUGEPAGE
> > > > on existing vmas looks more complex too and less finegrined so
> > > > probably more complex for userland to manage
> > >
> > > I would expect the prctl wouldn't iterate all vma's, nor would it modify
> > > def_flags anymore. It would just set a flag somewhere in mm struct that
> > > would be considered in addition to the per-vma flags when deciding
> > > whether to use THP.
> >
> > Exactly. Something like the below (not even compile tested).
>
> If we set aside the argument for keeping the kABI, this seems, hmm, a bit
> more complex than new madvise() :)
Yes, code wise it is more LOC which is not all that great but semantic
wise it make much more sense than the current implementation of
PR_SET_THP_DISABLE.
> It seems that for CRIU usecase such behaviour of prctl will work and it
> probably will be even more convenient than madvise(). Nonetheless, I think
> madvise() is the more elegant and correct solution.
>
> > > We could consider whether MADV_HUGEPAGE should be
> > > able to override the prctl or not.
> >
> > This should be a master override to any per vma setting.
>
> Currently, MADV_HUGEPAGE overrides the prctl(PR_SET_THP_DISABLE)...
> AFAIU, the prctl was intended to work with applications unaware of THP and
> for the cases where addition of MADV_*HUGEPAGE to the application was not
> an option.
which makes it even more weird API IMHO.
--
Michal Hocko
SUSE Labs