Re: [RFC PATCH 2/3] sched: change scheduler to give preference to soft affinity CPUs

From: Peter Zijlstra
Date: Thu Jul 18 2019 - 07:38:25 EST


On Wed, Jul 17, 2019 at 08:31:25AM +0530, Subhra Mazumdar wrote:
>
> On 7/2/19 10:58 PM, Peter Zijlstra wrote:
> > On Wed, Jun 26, 2019 at 03:47:17PM -0700, subhra mazumdar wrote:
> > > The soft affinity CPUs present in the cpumask cpus_preferred is used by the
> > > scheduler in two levels of search. First is in determining wake affine
> > > which choses the LLC domain and secondly while searching for idle CPUs in
> > > LLC domain. In the first level it uses cpus_preferred to prune out the
> > > search space. In the second level it first searches the cpus_preferred and
> > > then cpus_allowed. Using affinity_unequal flag it breaks early to avoid
> > > any overhead in the scheduler fast path when soft affinity is not used.
> > > This only changes the wake up path of the scheduler, the idle balancing
> > > is unchanged; together they achieve the "softness" of scheduling.
> > I really dislike this implementation.
> >
> > I thought the idea was to remain work conserving (in so far as that
> > we're that anyway), so changing select_idle_sibling() doesn't make sense
> > to me. If there is idle, we use it.
> >
> > Same for newidle; which you already retained.
> The scheduler is already not work conserving in many ways. Soft affinity is
> only for those who want to use it and has no side effects when not used.
> Also the way scheduler is implemented in the first level of search it may
> not be possible to do it in a work conserving way, I am open to ideas.

I really don't understand the premise of this soft affinity stuff then.

I understood it was to allow spreading if under-utilized, but group when
over-utilized, but you're arguing for the exact opposite, which doesn't
make sense.

> > And I also really don't want a second utilization tipping point; we
> > already have the overloaded thing.
> The numbers in the cover letter show that a static tipping point will not
> work for all workloads. What soft affinity is doing is essentially trading
> off cache coherence for more CPU. The optimum tradeoff point will vary
> from workload to workload and the system metrics of coherence overhead etc.
> If we just use the domain overload that becomes a static definition of
> tipping point, we need something tunable that captures this tradeoff. The
> ratio of CPU util seemed to work well and capture that.

And then you run two workloads with different characteristics on the
same box.

Global knobs are buggered.