Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

From: Andrea Arcangeli
Date: Tue May 30 2017 - 12:06:19 EST


On Tue, May 30, 2017 at 04:56:33PM +0200, Michal Hocko wrote:
> On Tue 30-05-17 16:39:41, Michal Hocko wrote:
> > On Tue 30-05-17 16:04:56, Andrea Arcangeli wrote:
> [...]
> > > About the proposed madvise, it just clear bits, but it doesn't change
> > > at all how those bits are computed in THP code. So I don't see it as
> > > convoluted.
> >
> > But we already have MADV_HUGEPAGE, MADV_NOHUGEPAGE and prctl to
> > enable/disable thp. Doesn't that sound little bit too much for a single
> > feature to you?
>
> And also I would argue that the prctl should be usable for this specific
> usecase. The man page says
> "
> Setting this flag provides a method for disabling transparent huge pages
> for jobs where the code cannot be modified
> "
>
> and that fits into the described case AFAIU. The thing that the current
> implementation doesn't work is a mere detail. I would even argue that
> it is non-intuitive if not buggy right away. Whoever calls this prctl
> later in the process life time will simply not stop THP from creating.
>
> So again, why cannot we fix that? There was some handwaving about
> potential overhead but has anybody actually measured that?

I'm not sure if it should be considered a bug, the prctl is intended
to use normally by wrappers so it looks optimal as implemented this
way: affecting future vmas only, which will all be created after
execve executed by the wrapper.

What's the point of messing with the prctl so it mangles over the
wrapper process own vmas before exec? Messing with those vmas is pure
wasted CPUs for the wrapper use case which is what the prctl was
created for.

Furthermore there would be the risk a program that uses the prctl not
as a wrapper and then calls the prctl to clear VM_NOHUGEPAGE from
def_flags assuming the current kABI. The program could assume those
vmas that were instantiated before disabling the prctl are still with
VM_NOHUGEPAGE set (they would not after the change you propose).

Adding a scan of all vmas to PR_SET_THP_DISABLE to clear VM_NOHUGEPAGE
on existing vmas looks more complex too and less finegrined so
probably more complex for userland to manage, but ignoring all above
considerations it would be a functional alternative for CRIU's
needs. However if you didn't like the complexity of the new madvise
which is functionally a one-liner equivalent to MADV_NORMAL, I
wouldn't expect you to prefer to make the prctl even more complex with
a loop over all vmas that despite being fairly simple it'll still be
more than a trivial one liner.

Thanks,
Andrea