Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

From: David Rientjes
Date: Fri Jun 21 2019 - 17:16:47 EST


On Thu, 6 Jun 2019, David Rientjes wrote:

> The idea that I had was snipped from this, however, and it would be nice
> to get some feedback on it: I've suggested that direct reclaim for the
> purposes of hugepage allocation on the local node is never worthwhile
> unless and until memory compaction can both capture that page to use (not
> rely on the freeing scanner to find it) and that migration of a number of
> pages would eventually result in the ability to free a pageblock.
>
> I'm hoping that we can all agree to that because otherwise it leads us
> down a bad road if reclaim is doing pointless work (freeing scanner can't
> find it or it gets allocated again before it can find it) or compaction
> can't make progress as a result of it (even though we can migrate, it
> still won't free a pageblock).
>
> In the interim, I think we should suppress direct reclaim entirely for
> thp allocations, regardless of enabled=always or MADV_HUGEPAGE because it
> cannot be proven that the reclaim work is beneficial and I believe it
> results in the swap storms that are being reported.
>
> Any disagreements so far?
>
> Furthermore, if we can agree to that, memory compaction when allocating a
> transparent hugepage fails for different reasons, one of which is because
> we fail watermark checks because we lack migration targets. This is
> normally what leads to direct reclaim. Compaction is *supposed* to return
> COMPACT_SKIPPED for this but that's overloaded as well: it happens when we
> fail extfrag_threshold checks and wheng gfp flags doesn't allow it. The
> former matters for thp.
>
> So my proposed change would be:
> - give the page allocator a consistent indicator that compaction failed
> because we are low on memory (make COMPACT_SKIPPED really mean this),
> - if we get this in the page allocator and we are allocating thp, fail,
> reclaim is unlikely to help here and is much more likely to be
> disruptive
> - we could retry compaction if we haven't scanned all memory and
> were contended,
> - if the hugepage allocation fails, have thp check watermarks for order-0
> pages without any padding,
> - if watermarks succeed, fail the thp allocation: we can't allocate
> because of fragmentation and it's better to return node local memory,
> - if watermarks fail, a follow up allocation of the pte will likely also
> fail, so thp retries the allocation with a cleared __GFP_THISNODE.
>
> This doesn't sound very invasive and I'll code it up if it will be tested.
>

Following up on this since there has been no activity in a week, I am
happy to prototype this. Andrea, would you be able to test a patch once
it is ready for you to try?