Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

From: David Rientjes
Date: Wed Oct 10 2018 - 17:19:30 EST


On Tue, 9 Oct 2018, Andrea Arcangeli wrote:

> I think "madvise vs mbind" is more an issue of "no-permission vs
> permission" required. And if the processes ends up swapping out all
> other process with their memory already allocated in the node, I think
> some permission is correct to be required, in which case an mbind
> looks a better fit. MPOL_PREFERRED also looks a first candidate for
> investigation as it's already not black and white and allows spillover
> and may already do the right thing in fact if set on top of
> MADV_HUGEPAGE.
>

We would never want to thrash the local node for hugepages because there
is no guarantee that any swapping is useful. On COMPACT_SKIPPED due to
low memory, we have very clear evidence that pageblocks are already
sufficiently fragmented by unmovable pages such that compaction itself,
even with abundant free memory, fails to free an entire pageblock due to
the allocator's preference to fragment pageblocks of fallback migratetypes
over returning remote free memory.

As I've stated, we do not want to reclaim pointlessly when compaction is
unable to access the freed memory or there is no guarantee it can free an
entire pageblock. Doing so allows thrashing of the local node, or remote
nodes if __GFP_THISNODE is removed, and the hugepage still cannot be
allocated. If this proposed mbind() that requires permissions is geared
to me as the user, I'm afraid the details of what leads to the thrashing
are not well understood because I certainly would never use this.