Re: [PATCH v2 00/10] sched: Flatten the pick

From: Peter Zijlstra

Date: Mon May 18 2026 - 03:16:44 EST


On Tue, May 12, 2026 at 08:45:21AM -1000, Tejun Heo wrote:
> Hello, Peter.
>
> On Tue, May 12, 2026 at 10:10:00AM +0200, Peter Zijlstra wrote:
> ...
> > Anyway, this is why I've been looking at these alternative weight
> > schemes, to get the nominal fraction near 1 and make these problems go
> > away. It is both the numerical issues and the disparity between levels
> > (with root being at level 0 being the most obvious).
>
> I see. I think what bothers me is that I'm unsure what the weight config
> would mean when the shares are scaled by the number of active cpus in that
> cgroup.

Relative weight per active cpu :-), but yes, that is a somewhat more
difficult concept I suppose.

> Here's a simple example:
>
> - There are 256 cpus.
> - /cgroup-A has weight 100 and 128 active threads. No pinning.
> - /cgroup-B has weight 100 and 256 active thredas. No pinning.
>
> In the current code, assuming math holds up, cgroup-A and B would get about
> the same shares - ~128 CPUs each. However, if we scale the share by active
> CPUs in each cgroup, B's tasks would end up with the same weight as A's on
> CPUs that they end up competing on, which would lead to ~ 1:3 distribution.
> Is that the right reading of the code?

Indeed. So both A and B will get ~1024 weight per (active) CPU, such
that on the CPUs they contend they will get 1:1 and then B will get the
full CPU on the uncontested CPUs, resulting in a total of 1:3
distribution.

This can of course be compensated by increasing the relative
weight of A, if that is so desired. But the alternative view is that for
those 128 CPUs they overlap, A and B will get equal parts, it is just
that B consumes another 128 CPUs and will not have contention there.

So the current scheme will inflate the part of A to be double the weight
(of B), giving them 2 out of 3 parts on the contended CPUs, but then B
will still get complete / uncontested access to those extra 128 CPUs,
resulting in a 2:4 weight distribution.

Which also isn't as straight forward as one might think.

So perhaps 'weight on the CPUs you contest on' isn't as unintuitive as
it seems on first glance, its just different.

And it has tremendous advantages as outlined before; it is naturally
normalized -- the disparity between nesting levels goes away, and the
edge case of a single CPU active will be sane.

Eg. consider your example except now A will have 1 active thread. Then A
will get the full group weight (1024) on its one CPU, while B will get
(1024/256=8) on each CPU.

So for the one contended CPU A gets 256 out of 257 parts, while B gets
the full CPU for the remaining 255 CPUs, for a:

256 1 257
--- : --- + 255*--- = 256:65535 ~ 1:256
257 257 257

distribution. While with the new scheme it would be:

1 1 2
- : - + 255*- = 1:511
2 2 2

Which, realistically isn't all that different, except the old scheme has
this really large weight to deal with.

So from where I'm sitting, yes different, but it behaves better.