Re: [PATCH v4 5/8] sched/fair: Take into account latency priority at wakeup

From: Dietmar Eggemann
Date: Mon Sep 19 2022 - 06:05:55 EST


On 16/09/2022 10:03, Vincent Guittot wrote:

[...]

> @@ -4512,7 +4519,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
> p->prio = current->normal_prio;
>
> /* Propagate the parent's latency requirements to the child as well */
> - p->latency_nice = current->latency_nice;
> + p->latency_prio = current->latency_prio;

Isn't here a `set_latency_offset(p)` missing here?

>
> uclamp_fork(p);
>

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e8c1b889dcbb..a20eadb0af97 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4574,6 +4574,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> update_idle_cfs_rq_clock_pelt(cfs_rq);
> }
>
> +static long wakeup_latency_gran(struct sched_entity *curr, struct sched_entity *se);

minor: `struct sched_entity *curr` ... doesn't have to be current
(cfs_rq->curr). Isn't this more like `struct sched_entity *sea, struct
sched_entity *seb`? Anyway, it's already the case for
`wakeup_preempt_entity`.

[...]

> @@ -5732,6 +5735,35 @@ static int sched_idle_cpu(int cpu)
> }
> #endif
>
> +static void set_next_buddy(struct sched_entity *se);
> +
> +static void check_preempt_from_others(struct cfs_rq *cfs, struct sched_entity *se)

minor: Why `struct cfs_rq *cfs` and not `struct cfs_rq *cfs_rq` ?

Using `cfs_rq` would make it more consistent when looking for things
like `cfs_rq->nr_running` for example.

> +{
> + struct sched_entity *next;
> +
> + if (se->latency_offset >= 0)
> + return;
> +
> + if (cfs->nr_running <= 1)
> + return;
> + /*
> + * When waking from idle, we don't need to check to preempt at wakeup

s/idle/others ?

> + * the idle thread and don't set next buddy as a candidate for being
> + * picked in priority.
> + * In case of simultaneous wakeup from idle, the latency sensitive tasks
> + * lost opportunity to preempt non sensitive tasks which woke up
> + * simultaneously.
> + */

The position of this comment block within this function is somehow
misleading since it describes the reason for the function rather then a
particular condition within this function. Wouldn't it be more readable
when it would be a function header comment instead?

[...]

> @@ -7148,6 +7181,22 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> }
> #endif /* CONFIG_SMP */
>
> +static long wakeup_latency_gran(struct sched_entity *curr, struct sched_entity *se)
> +{
> + long latency_offset = se->latency_offset;
> +
> + /*
> + * A negative latency weigth means that the sched_entity has latency

s/weigth/latency_offset ?


> + * requirement that needs to be evaluated versus other entity.
> + * Otherwise, use the latency weight to evaluate how much scheduling
> + * delay is acceptable by se.
> + */
> + if ((se->latency_offset < 0) || (curr->latency_offset < 0))
> + latency_offset -= curr->latency_offset;

I still don't get the rationale behind why when either one (se or curr)
of the latency_nice values is negative, we use the diff between them but
if not, we only care about se's value. Why don't you always use the diff
between se and curr? Since we have a range [-20 ... 19] why shouldn't we
use the difference between let's say se = 19 and curr = 5?
You discussed this with Tao Zhou on the v1 but I didn't understand it fully.

[...]