Re: Sum of weights idea for CFS PI

From: Joel Fernandes
Date: Mon Oct 10 2022 - 11:11:47 EST


On Mon, Oct 10, 2022 at 10:46 AM Qais Yousef <qais.yousef@xxxxxxx> wrote:
>
> On 10/08/22 11:04, Joel Fernandes wrote:
> >
> >
> > > On Oct 6, 2022, at 3:40 PM, Youssef Esmat <youssefesmat@xxxxxxxxxx> wrote:
> > >
> > [..]
> > >>
> > >>> Anyway - just trying to explain how I see it and why C is unlikely to be
> > >>> taking too much time. I could be wrong. As Youssef said, I think there's
> > >>> no fundamental problem here.
> > >>
> > >> I know on Android where they use smaller HZ, the large tick causes lots of
> > >> problems for large nice deltas. Example if a highly niced task was to be
> > >> preempted for 1ms, and preempts instead at 3ms, then the less-niced task
> > >> will not be so nice (even less nice than it promised to be) any more
> > >> because of the 2ms boost that the higher niced task got. This can lead the
> > >> the sched_latency thrown out of the window. Not adjusting the weights
> > >> properly can potentially make that problem much worse IMO.
> > >
> > > Once C releases the lock it should get adjusted and A will get adjusted
> > > also regardless of tick. At the point we adjust the weights we have
> > > a chance to check for preemption and cause a reschedule.
> >
> > Yes but the lock can be held for potentially long time (and even user space
> > lock). I’m more comfortable with Peter’s PE patch which seems a more generic
> > solution, than sum of weights if we can get it working. I’m studying Connor’s
> > patch set now…
>
> The 2 solutions are equivalent AFAICT.

Possibly. Maybe I am talking about a non-issue then, but I had to be
careful ;-) Maybe both have the issue I was referring to, or they
don't. But in any case, PE seems more organic.

> With summation:
>
> A , B , C , D
> sleeping, running, running, running
> - , 1/5 , 3/5 , 1/5
>
> Where we'll treat A as running but donate its bandwidth to C, the mutex owner.

> With PE:
>
> A , B , C , D
> running, running, running, running
> 2/5 , 1/5 , 1/5 , 1/5
>
> Where A will donate its execution context to C, the mutex owner.

Yes. It would also be great if Peter can participate in this thread,
if he has time. Not to nitpick but to be more precise in PE
terminology, you mean "scheduler context". The "execution context" is
not inherited [1]

If p1 is selected to run while still blocked, the lock owner p2 can
run "on its behalf", inheriting p1's scheduler context. Execution
context is not inherited, meaning that e.g. the CPUs where p2 can run
are still determined by its own affinity and not p1's.

[1] https://lore.kernel.org/all/73859883-78c4-1080-7846-e8d644ad397a@xxxxxxxxxx/t/#mdf0146cdf78e48fc5cc515c1a34cdc1d596e0ed8

> In both cases we should end up with the same distribution as if neither A nor
> C ever go to sleep because of holding the mutex.

Hopefully!

> I still can't see how B and D fairness will be impacted as the solution to the
> problem is to never treat a waiter as sleeping and let the owner run for more,
> but only within the limit of what the waiter is allowed to run for. AFAICS,
> both solutions maintain this relationship.

True!

- Joel