Re: [RESEND][RFC] sched: Introduce removed.load_sum for precise load propagation

From: hupu

Date: Wed Oct 15 2025 - 03:20:38 EST


Hi Vincent Guittot, Pierre Gondois, and fellow maintainers,

This PATCH has been pending for several days without any feedback.
Please allow me to RESEND this as a gentle reminder.

Thanks,
hupu


On Sat, Oct 11, 2025 at 10:27 AM hupu <hupu.gm@xxxxxxxxx> wrote:
>
> Hi Pierre Gondois,
> Just wanted to mention a small detail that’s easy to miss.
>
> On Fri, Oct 10, 2025 at 7:37 PM hupu <hupu.gm@xxxxxxxxx> wrote:
> > > It is possible to compute load_sum value without the runnable_signal, cf.
> > > 40f5aa4c5eae ("sched/pelt: Fix attach_entity_load_avg() corner case")
> > > https://lore.kernel.org/all/20220414090229.342-1-kuyo.chang@xxxxxxxxxxxx/T/#u
> > >
> > > I.e.:
> > > + se->avg.load_sum = se->avg.load_avg * divider;
> > > + if (se_weight(se) < se->avg.load_sum)
> > > + se->avg.load_sum = div_u64(se->avg.load_sum, se_weight(se));
> > > + else
> > > + se->avg.load_sum = 1;
> > >
> > > As a side note, as a counterpart of the above patch, the lower the niceness,
> > > the lower the weight (in sched_prio_to_weight[]) and the lower the task
> > > load signal.
> > > This means that the unweighted load_sum value looses granularity.
> > > E.g.:
> > > A task with weight=15 can have load_avg values in [0:15]. So all the values
> > > for load_sum in the range [X * (47742/15) : (X + 1) * (47742/15)]
> > > are floored to load_avg=X, but load_sum is not reset when computing
> > > load_avg.
> > > attach_entity_load_avg() however resets load_sum to X * (47742/15).
> > >
> >
> > From a mathematical perspective, deriving load_sum from load_avg is
> > indeed feasible.
> >
> > However, as you pointed out, integer arithmetic may introduce
> > significant quantization errors, particularly for tasks with low
> > weights.
> >
> > For instance, if a task’s weight is 15 and its load_sum values are
> > 3183 and 6364 respectively, both would result in the same load_avg = 1
> > under this method — resulting in an error of 6364 - 3183 = 3181. This
> > error increases as the task’s weight decreases.
> >
> > Therefore, I believe that recomputing the propagated load_sum from
> > load_avg within update_cfs_rq_load_avg() is not an ideal approach.
> > Instead, my proposal is to record the load_sum of dequeued tasks
> > directly in cfs_rq->removed, rather than inferring it indirectly from
> > other signals such as runnable_sum or load_avg.
> >
>
> In addition, weight is a historical variable that may change over time
> due to dynamic priority adjustments. Therefore, reconstructing
> load_sum from load_avg using the current se_weight(se) in
> update_cfs_rq_load_avg() may be wrong, as it mixes values computed
> under different weight conditions.
>
> So, I believe directly recording each entity’s load_sum at dequeue
> time offers a more accurate and consistent approach.
>
> Thanks,
> hupu