Re: [PATCH 2/2] Revert "mm, thp: restore node-local hugepage allocations"

From: Andrew Morton
Date: Thu May 23 2019 - 21:00:36 EST


On Mon, 20 May 2019 10:54:16 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> We are going in circles, *yes* there is a problem for potential swap
> storms today because of the poor interaction between memory compaction and
> directed reclaim but this is a result of a poor API that does not allow
> userspace to specify that its workload really will span multiple sockets
> so faulting remotely is the best course of action. The fix is not to
> cause regressions for others who have implemented a userspace stack that
> is based on the past 3+ years of long standing behavior or for specialized
> workloads where it is known that it spans multiple sockets so we want some
> kind of different behavior. We need to provide a clear and stable API to
> define these terms for the page allocator that is independent of any
> global setting of thp enabled, defrag, zone_reclaim_mode, etc. It's
> workload dependent.

um, who is going to do this work?

Implementing a new API doesn't help existing userspace which is hurting
from the problem which this patch addresses.

It does appear to me that this patch does more good than harm for the
totality of kernel users, so I'm inclined to push it through and to try
to talk Linus out of reverting it again.