Re: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP

From: luca abeni
Date: Tue May 16 2023 - 12:19:51 EST


Hi,

On Tue, 16 May 2023 11:08:18 -0400
Vineeth Remanan Pillai <vineeth@xxxxxxxxxxxxxxx> wrote:

> Hi Luca,
>
> On Tue, May 16, 2023 at 3:37 AM luca abeni
> <luca.abeni@xxxxxxxxxxxxxxx> wrote:
> > > I have noticed this behaviour where the reclaimed time is not
> > > equally distributed when we have more tasks than available
> > > processors. But it depended on where the task was scheduled.
> > > Within the same cpu, the distribution seemed to be proportional.
> >
> > Yes, as far as I remember it is due to migrations. IIRC, the
> > problem is related to the fact that using "dq = -Uact / Umax * dt"
> > a task running on a core might end up trying to reclaim some idle
> > time from other cores (which is obviously not possible).
> > This is why m-GRUB used "1 - Uinact" instead of "Uact"
> >
> This is what I was a little confused about. In "-Uact / Umax", all
> the variables are per-cpu and it should only be reclaiming what is
> free on the cpu right? And when migration happens, Uact changes
> and the reclaiming adapts itself.

Sorry, I do not remember the details... But I think the problem is in
the transient when a task migrates from a core to a different one.
I am trying to search from my old notes to see if I find some more
details.


> I was thinking it should probably
> be okay for tasks to reclaim differently based on what free bw is
> left on the cpu it is running. For eg: if cpu 1 has two tasks of bw
> .3 each, each task can reclaim "(.95 - .6) / 2" and another cpu with
> only one task(.3 bandwidth) reclaims (.95 - .3). So both cpus
> utilization is .95 and tasks reclaim what is available on the cpu.

I suspect (but I am not sure) this only works if tasks do not migrate.


> With "1 - Uinact", where Uinact accounts for a portion of global free
> bandwidth, tasks reclaim proportionately to the global free bandwidth
> and this causes tasks with lesser bandwidth to reclaim lesser when
> compared to higher bandwidth tasks even if they don't share the cpu.
> This is what I was seeing in practice.

Just to be sure: is this with the "original" Uextra setting, or with
your new "Uextra = Umax - this_bw" setting?
(I am not sure, but I suspect that "1 - Uinact - Uextra" with your new
definition of Uextra should work well...)


[...]
> > I think I can now understand at least part of the problem. In my
> > understanding, the problem is due to using
> > dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt
> >
> > It should really be
> > dq = -(max{u_i, (1 - Uinact - Uextra)} / Umax) * dt
> >
> > (since we divide by Umax, using "Umax - ..." will lead to
> > reclaiming up to "Umax / Umax" = 1)
> >
> > Did you try this equation?
> >
> I had tested this and it was reclaiming much less compared to the
> first one. I had 3 tasks with reservation (3,100) and 3 cpus.
>
> With dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt (1)
> TID[636]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.08
> TID[635]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.07
> TID[637]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.06
>
> With dq = -(max{u_i, (1 - Uinact - Uextra)} / Umax) * dt (2)
> TID[601]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> TID[600]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65
> TID[602]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65

Maybe I am missing something and I am misunderstanding the situation,
but my impression was that this is the effect of setting
Umax - \Sum(u_i / #cpus in the root domain)
I was hoping that with your new Umax setting this problem could be
fixed... I am going to double-check my reasoning.


Thanks,
Luca