Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

From: Michal Hocko
Date: Mon Dec 03 2018 - 13:15:10 EST


On Mon 03-12-18 10:01:18, Linus Torvalds wrote:
> On Wed, Nov 28, 2018 at 8:48 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Nov 27, 2018 at 7:20 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
> > >
> > > In general, memory allocation fairness among processes should be a good
> > > thing. So I think the report should have been a "performance
> > > improvement" instead of "performance regression".
> >
> > Hey, when you put it that way...
> >
> > Let's ignore this issue for now, and see if it shows up in some real
> > workload and people complain.
>
> Well, David Rientjes points out that it *does* cause real problems in
> real workloads, so it's not just this benchmark run that shows the
> issue.

The thing is that there is no universal win here. There are two
different types of workloads and we cannot satisfy both. This has been
discussed at lenght during the review process. David's workload makes
some assumptions about the MADV_HUGEPAGE numa placement. There are other
workalods like KVM setups which do not really require that and those are
ones which regressed.

The prevalent consensus was that a NUMA placement is not really implied
by MADV_HUGEPAGE because a) this has never been documented or intended
behavior and b) it is not a universal win (basically the same as
node/zone_reclaim which used to be on by default on some NUMA setups).

Reverting the patch would regress another class of workloads. As we
cannot satisfy both I believe we should make the API clear and in favor
of a more relaxed workloads. Those with special requirements should have
a proper API to reflect that (this is our general NUMA policy pattern
already).
--
Michal Hocko
SUSE Labs