Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

From: David Rientjes
Date: Wed Sep 12 2018 - 16:42:04 EST


On Wed, 12 Sep 2018, Michal Hocko wrote:

> > Saying that we really want THP isn't an all-or-nothing decision. We
> > certainly want to try hard to fault hugepages locally especially at task
> > startup when remapping our .text segment to thp, and MADV_HUGEPAGE works
> > very well for that. Remote hugepages would be a regression that we now
> > have no way to avoid because the kernel doesn't provide for it, if we were
> > to remove __GFP_THISNODE that this patch introduces.
>
> Why cannot you use mempolicy to bind to local nodes if you really care
> about the locality?
>

Because we do not want to oom kill, we want to fallback first to local
native pages and then to remote native pages. That's the order of least
to greatest latency, we do not want to work hard to allocate a remote
hugepage when a local native page is faster. This seems pretty straight
forward.

> From what you have said so far it sounds like you would like to have
> something like the zone/node reclaim mode fine grained for a specific
> mapping. If we really want to support something like that then it should
> be a generic policy rather than THP specific thing IMHO.
>
> As I've said it is hard to come up with a solution that would satisfy
> everybody but considering that the existing reports are seeing this a
> regression and cosindering their NUMA requirements are not so strict as
> yours I would tend to think that stronger NUMA requirements should be
> expressed explicitly rather than implicit effect of a madvise flag. We
> do have APIs for that.

Every process on every platform we have would need to define this explicit
mempolicy for users of libraries that remap text segments because changing
the allocation behavior of thp out from under them would cause very
noticeable performance regressions. I don't know of any platform where
remote hugepages is preferred over local native pages. If they exist, it
sounds resaonable to introduce a stronger variant of MADV_HUGEPAGE that
defines exactly what you want rather than causing it to become a dumping
ground and userspace regressions.