Re: [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains v2

From: Peter Zijlstra
Date: Tue Jan 07 2020 - 07:28:29 EST


On Tue, Jan 07, 2020 at 12:22:55PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2020 at 09:56:55AM +0000, Mel Gorman wrote:

> > + unsigned int imbalance_adj;
> > +
> > + /*
> > + * Calculate an acceptable degree of imbalance based
> > + * on imbalance_adj. However, do not allow a greater
> > + * imbalance than the child domains weight to avoid
> > + * a case where the allowed imbalance spans multiple
> > + * LLCs.
> > + */
>
> That comment is a wee misleading, @child is not an LLC per se. This
> could be the NUMA distance 2 domain, in which case @child is the NUMA
> distance 1 group.
>
> That said, even then it probably makes sense to ensure you don't idle a
> whole smaller distance group.

Hmm, one more thing. On AMD EPYC, which the multiple LLCs, you'll have
the single NODE domain in between, and that is not marked with SD_NUMA
(iirc).

So specifically the case you want to handle is not in fact handled. The
first SD_NUMA (distance-1) will have all NODE children, which on EPYC
are not LLCs.