Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

From: Vineeth Remanan Pillai
Date: Mon May 22 2023 - 15:23:16 EST


Hi Luca,

Merging the last two mails in this reply :-)

> So, we are wasting 181.3 - 95 = 86.3% of CPU time, which 590 cannot
> reclaim (because it cannot execute simultaneously on 2 CPUs).
>
Correct. Thanks for explaining it in detail, I was tracing the scheduler
and verified this pattern you explained.

> Now that the problem is more clear to me, I am trying to understand a
> possible solution (as you mention, moving some extra bandwidth from the
> 590's CPU will fix this problem... But I am not sure if this dynamic
> extra bandwidth migration is feasible in practice without introducing
> too much overhead)
>
> I'll look better at your new proposal.
>
The idea that I mentioned tries to solve this problem in a best effort
way: If global load is high, use the global "Uextra = rq->dl.extra_bw"
and "Umax = rq->dl.this_bw + rq->dl.extra_bw". Otherwise use the local
values "Umax= rq->dl.max_bw", "Uextra= rq->dl.max_bw - rq->dl.this_bw".
This is still not perfect, but tries to reclaim very close to maximum
allowed limit almost always.

Please have a look when you get a chance :-).

>
> I just tried to repeat this test on a VM with 3 CPUs, and I can
> reproduce the stall (100% of CPU time reclaimed by SCHED_DEADLINE
> tasks, with no possibility for the other tasks to execute) when I use
> dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt
>
> But when I use
> dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt
> everything works as expected, the 4 tasks reclaim 95% of the CPU
> time and my shell is still active...
> (so, I cannot reproduce the starvation issue with this equation)
>
Sorry about this confusion, yes you are right, there is no stall with
this equation. The only issue is the lesser reclaim when the load is
less and tasks have different bandwidth requirements.

> So, I now think the second one is the correct equation to be used.
>
Thanks for confirming.

I think it probably makes sense to get the fix for the equation to go
in as a first step and then we can investigate more about the second
issue (less reclaiming with less load and different bandwidth) and
fix it separately. What do you think? I shall send the next iteration
with the fix for the equation alone if its okay with you.

Thanks,
Vineeth