Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

From: Michal Hocko
Date: Mon Sep 30 2019 - 07:28:20 EST


On Sat 28-09-19 13:59:26, Linus Torvalds wrote:
> On Fri, Sep 27, 2019 at 12:48 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > - page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> > + if (!order)
> > + page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> > if (page)
> > goto got_pg;
> >
> > The whole point of handling this in the page allocator directly is to
> > have a unified solutions rather than have each specific caller invent
> > its own way to achieve higher locality.
>
> The above just looks hacky.

It is and it was meant to help move on when debugging rather than a
final solution.

> Why would order-0 be special?

Ideally it wouldn't be but the current implementation makes it special.
Why? Because the whole concept of low wmark fast path attempt is based
on kswapd balancing for a high watermark providing some space. Kcompactd
doesn't have any notion like that. And I believe that a large part of
the problem really is there. If I am wrong here then I would appreciate
to be corrected.

If __GFP_THISNODE allows for a better THP utilization on a local node
then the problem points at kcompactd not being pro-active enough. And
that was the first diff aiming at.

I also claim that this is not a THP specific problem. You are right
that lower orders are less likely to hit the problem because the memory
is usually not fragmented that heavily but fundamentally the over eager
fallback in the fast path is still there. And that is the reason for me
to pushback against __GFP_THIS_NODE && fallback allocation opencoded
outside of the allocator. The allocator knows the context can compact
so why should we require the caller to be doing that?

Do not get me wrong, but we have a quite a long history of fine tuning
for THP by adding kludges here and there and they usually turnout to
break something else. I really want to get to understand the underlying
problem and base a solution on it rather than "__GFP_THISNODE can cause
overreclaim so pick up a break out condition and hope for the best".
--
Michal Hocko
SUSE Labs