Re: [PATCH v4 6/7] sched/deadline: Deferrable dl server

From: Daniel Bristot de Oliveira
Date: Tue Sep 05 2023 - 12:33:35 EST


On 9/5/23 15:42, Peter Zijlstra wrote:
> On Thu, Aug 31, 2023 at 10:28:57PM +0200, Daniel Bristot de Oliveira wrote:
>> +void dl_server_start(struct sched_dl_entity *dl_se, int defer)
>> {
>> + if (dl_se->server_state != DL_SERVER_STOPPED) {
>> + WARN_ON_ONCE(!(on_dl_rq(dl_se) || dl_se->dl_throttled));
>> + return;
>> + }
>> +
>> + if (defer) {
>> + /*
>> + * Postpone the replenishment to the (next period - the execution time)
>> + *
>> + * With this in place, we have two cases:
>> + *
>> + * On the absence of DL tasks:
>> + * The server will start at the replenishment time, getting
>> + * its runtime before now + period. This is the expected
>> + * throttling behavior.
>> + *
>> + * In the presense of DL tasks:
>> + * The server will be replenished, and then it will be
>> + * schedule according to EDF, not breaking SCHED_DEADLINE.
>> + *
>> + * In the first cycle the server will be postponed at most
>> + * at period + period - runtime at most. But then the
>> + * server will receive its runtime/period.
>> + *
>> + * The server will, however, run on top of any RT task, which
>> + * is the expected throttling behavior.
>> + */
>> + dl_se->deadline = rq_clock(dl_se->rq) + dl_se->dl_period - dl_se->dl_runtime;
>
> I was confused by this, but this is an instance of
> replenish_dl_new_period(), where we explicitly do not preserve the
> period.

yeah, it is two operations in one (so not straight forward). It sets the period as the now - runtime, so..


>> + /* Zero the runtime */
>> + dl_se->runtime = 0;
>> + /* throttle the server */
>> + dl_se->dl_throttled = 1;

as the server is throttled, it will set the replenishing timer for now - runtime + period because
of the deadline.

> These comments are both obvious and placed so as to make everything
> unreadable :/

I can't disagree :-) I will remove.

>> +
>> + dl_se->server_state = DL_SERVER_DEFER;
>> + start_dl_timer(dl_se);
>> + return;
>> + }
>> +
>> if (!dl_server(dl_se)) {
>> dl_se->dl_server = 1;
>> setup_new_dl_entity(dl_se);
>> }
>> +
>> + dl_se->server_state = DL_SERVER_RUNNING;
>> enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
>> }
>
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 580e6764a68b..b9d0f08dc8ca 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6499,9 +6499,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>> */
>> util_est_enqueue(&rq->cfs, p);
>>
>> - if (!rq->cfs.h_nr_running)
>> - dl_server_start(&rq->fair_server);
>> -
>> /*
>> * If in_iowait is set, the code below may not trigger any cpufreq
>> * utilization updates, so do it here explicitly with the IOWAIT flag
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index e23cc67c9467..7595110a5a3e 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -1537,6 +1537,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>>
>> if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>> enqueue_pushable_task(rq, p);
>> +
>> + if (sched_fair_server_needed(rq))
>> + dl_server_start(&rq->fair_server, rq->fair_server_defer);
>> }
>>
>> static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>> @@ -1547,6 +1550,9 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>> dequeue_rt_entity(rt_se, flags);
>>
>> dequeue_pushable_task(rq, p);
>> +
>> + if (!sched_fair_server_needed(rq))
>> + dl_server_stop(&rq->fair_server);
>> }
>>
>> /*
>
> I'm thinking this is wrong -- ISTR there definite benefits to explicit
> slack time scheduling, meaning the server should also run while DL tasks
> are on.

I can add the check to enable it also in the presence of DL tasks, we just need
to have consider that the dl server is a dl task as well when disabling.

It is a benefit because it will fix the case in which a CPU full of DL tasks
(possible with global).

Additionally, running the server when there are only fair tasks
> is mostly harmless, right? So why this change?

Rhe problem is that we never know when a RT one will arrive :-/

E.g., only fair tasks, server enabled. All fine :-) Then an RT arrives, and gets
postponed by the server... RT people complaining (for those thinking on adding
a server for RT, we will have a similar problem as we have with throttling today +
DL has no fixed priority).

We will still face the same problem with defferable server, and the example is the same,
just with a shift in the RT arrival. e.g., only fair task for (server period - runtime),
then the server gets enabled, and a 1us after the RT arrives and gets postponed... again.

So the need to enable it only after the dispatch of a kind of RT (or DL to be added)
tasks that can cause starvation.

We could simplify it by enabling only on RT/DL arrival but... if there is nothing to
starve... it is not needed anyways so less overhead avoiding the enqueue...

>
> One of the benefits was -- IIRC, that we no longer need to subtract some
> random margin from the reservation limit, but there were more I think.
>

We can simplify things because we can start ignoring the rt throttling limit when
there is no need to throttle... like on grub (but, only after we remove the rt
throttling).

Thoughts?
-- Daniel