Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

From: Dietmar Eggemann
Date: Thu May 25 2023 - 07:57:01 EST


Hi Vineeth,

On 20/05/2023 04:15, Vineeth Remanan Pillai wrote:
> Hi Dietmar,
>
> On Fri, May 19, 2023 at 1:56 PM Dietmar Eggemann
> <dietmar.eggemann@xxxxxxx> wrote:
>
>>> TID[730]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.05
>>> TID[731]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 31.34
>>> TID[732]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 3.16
>>
>> What does this 'Util: X' value stand for? I assume it's the utilization
>> of the task? How do you obtain it?
>>
> Yes, it is the utilization of the task. I calculate it by dividing the
> cputime with elapsed time(using clock_gettime(2)).

Makes, sense, I guess what I missed here in the first place is the fact
that those DL tasks want to run 100%.

>> I see that e.g. TID[731] should run 1ms each 10ms w/o grub and with grub
>> the runtime could be potentially longer since 'scaled_delta_exec < delta'.
>>
> Yes correct. GRUB(Greedy Reclamation of Unused Bandwidth) algorithm
> is used here for deadline tasks that needs to run longer than their
> runtime when needed. sched_setattr allows a flag SCHED_FLAG_RECLAIM
> to indicate that the task would like to reclaim unused bandwidth of a
> cpu if available. For those tasks, 'runtime' is depreciated using the
> GRUB formula and it allows it to run for longer and reclaim the free
> bandwidth of the cpu. The GRUB implementation in linux allows a task
> to reclaim upto RT capacity(95%) and depends on the free bandwidth
> of the cpu. So TID[731] theoretically should run for 95ms as it is
> the only task in the cpu, but it doesn't get to run that long.

Correct.

>> I don't get this comment in update_curr_dl():
>>
>> 1325 /*
>> 1326 * For tasks that participate in GRUB, we implement GRUB-PA: the
>> 1327 * spare reclaimed bandwidth is used to clock down frequency.
>> 1328 *
>>
>> It looks like dl_se->runtime is affected and with 'scaled_delta_exec <
>> delta' the task runs longer than dl_se->dl_runtime?
>>
> Yes. As mentioned above, GRUB allows the task to run longer by slowing
> down the depreciation of "dl_se->dl_runtime". scaled_delta_exec is
> calculated by the GRUB formula explained in the paper [1] & [2].

What I didn't understand was this `GRUB-PA` and `the spare reclaimed
bandwidth is used to clock down frequency` in relation to GRUB task
runtime depreciation.

But now I think I get it. `GRUB-PA` means that in case we run with the
schedutil CPUfreq governor, the CPU frequency is influenced by Uact
(rq->dl.running_bw) via:

sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->

return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT

and on top of this we do GRUB reclaiming for those SCHED_FLAG_RECLAIM
tasks, i.e. task runtime depreciation.

>> I did the test discussed later in this thread with:
>>
>> 3 [3/100] tasks (dl_se->dl_bw = (3 << 20)/100 = 31457) on 3 CPUs
>>
>> factor = scaled_delta_exec/delta
>>
>> - existing grub
>>
>> rq->dl.bw_ratio = ( 100 << 8 ) / 95 = 269
>> rq->dl.extra_bw = ( 95 << 20 ) / 100 = 996147
>>
>> cpu=2 curr->[thread0-2 1715] delta=2140100 this_bw=31457
>> running_bw=31457 extra_bw=894788 u_inact=0 u_act_min=33054 u_act=153788
>> scaled_delta_exec=313874 factor=0.14
>>
>> - your solution patch [1-2]
>>
>> cpu=2 curr->[thread0-0 1676] delta=157020 running_bw=31457 max_bw=996147
>> res=4958 factor=0.03
>>
>> You say that GRUB calculation is inaccurate and that this inaccuracy
>> gets larger as the bandwidth of tasks becomes smaller.
>>
>> Could you explain this inaccuracy on this example?
>>
> According to GRUB, we should be able to reclaim the unused bandwidth
> for the running task upto RT limits(95%). In this example we have a
> task with 3ms runtime and 100ms runtime on a cpu. So it is supposed
> to run for 95ms before it is throttled.

Correct.

> Existing implementation's factor = 0.14 and 3ms is depreciated by
> this factor. So it gets to run for "3 / 0.14 ~= 22ms". This is the
> inaccuracy that the patch is trying to solve. With the patch, the
> factor is .03166 and runtime = "3 / 0.03166 ~= 95ms"

My tests were wrong since I was using DL task with dl_runtime=3ms and
dl_period = 100ms with an actual runtime=3ms whereas your tasks probably
want to run 100%.

> Hope this clarifies.

yes, it did, thanks!