Re: [PATCH] sched/fair: Refactor cpu_util_without()

From: Dietmar Eggemann
Date: Fri Mar 18 2022 - 12:29:42 EST


- Valentin Schneider <Valentin.Schneider@xxxxxxx>

On 02/03/2022 10:09, Vincent Guittot wrote:
> On Tue, 1 Mar 2022 at 18:17, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:

[...]

> I have only minor comment

Thanks for the review!

[...]

>> +static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
>> +{

[...]

>> + if (sched_feat(UTIL_EST)) {
>> + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
>> +
>> + /*
>> + * During wake-up, the task isn't enqueued yet and doesn't
>> + * appear in the cfs_rq->avg.util_est.enqueued of any rq,
>> + * so just add it (if needed) to "simulate" what will be
>> + * cpu_util after the task has been enqueued.
>> + */
>> + if (dst_cpu == cpu)
>> + util_est += _task_util_est(p);
>> +
>
> Could you add a comment that explains why the addition above will not
> be removed below by the lsub_positive below so it isn't worth trying
> to optimize such a case?

Yes. I rewored the comments in cpu_util_next() so they also apply when
called by cpu_util_without(). And I use a `if{}/else if{}` here too in v2.
>> + /*
>> + * Despite the following checks we still have a small window
>> + * for a possible race, when an execl's select_task_rq_fair()
>> + * races with LB's detach_task():
>> + *
>> + * detach_task()
>> + * p->on_rq = TASK_ON_RQ_MIGRATING;
>> + * ---------------------------------- A
>> + * deactivate_task() \
>> + * dequeue_task() + RaceTime
>> + * util_est_dequeue() /
>> + * ---------------------------------- B
>> + *
>> + * The additional check on "current == p" it's required to
>> + * properly fix the execl regression and it helps in further
>> + * reducing the chances for the above race.
>> + */
>> + if (unlikely(task_on_rq_queued(p) || current == p))
>> + lsub_positive(&util_est, _task_util_est(p));

I did a lot of testing on mainline & v4.20 and there wasn't one
occurrence of `p->on_rq == TASK_ON_RQ_MIGRATING` here. Not for WF_EXEC
tasks (p->on_rq = TASK_ON_RQ_QUEUED) and in case of v4.20 not for
WF_EXEC and WF_TTWU tasks (p->on_rq = 0). So I assume it's not needed. I
left it in v2 though and mentioned it in the additional comment section
of the patch.

[...]

>> static unsigned long cpu_util_without(int cpu, struct task_struct *p)
>> {

[...]

>> /*
>> * Covered cases:
>> *
>> @@ -6560,82 +6609,8 @@ static unsigned long cpu_util_without(int cpu, struct task_struct *p)
>> * estimation of the spare capacity on that CPU, by just
>> * considering the expected utilization of tasks already
>> * runnable on that CPU.
>
> The comment about the covered cases above should be moved in
> cpu_util_next() which is where the cases are covered now

Yes. I Incorporated it into the comments in cpu_util_next() in v2.

[...]