Re: Sum of weights idea for CFS PI

From: Joel Fernandes
Date: Thu Oct 06 2022 - 09:54:27 EST

On Wed, Oct 5, 2022 at 6:04 AM Qais Yousef <qais.yousef@xxxxxxx> wrote:
> Hi Joel
> On 10/04/22 16:27, Joel Fernandes wrote:
> [...]
> > I am treating the following the same:
> >
> > a. when A is running, it would be as above.
> > b. but if A was sleeping, B, C, and D would get 1/3.
> >
> > similar to
> >
> > a. when A is running *and blocked on C for all its runtime*
> > ^^ -- in this case, B and D should not have their distributions
> > changed at all because they are not participating in the
> > lock acquire and release. So they should neither be hurt
> > any more, nor be boosted. They should simply stay same [1]
> >
> > b. but if A was sleeping, B, C, and D would get 1/3.
> >
> >
> > [1] Why? Consider 3 tasks in the all-RT case, A high, B medium and C low prio.
> >
> > If all are running 100% and A does not block on C, B is blocked by A
> > indefinitely. So the prio of A and B are inverted. We seek to rectify this, that
> > is we need make changes such that, B is returned back to the blocked state. We
> > do this by boosting C.
> >
> > In other words, the prio inheritance will cause B's distribution to not be
> > changed (it was supposed to be blocked before and it is now going to be blocked
> > state again).
> >
> > CFS should not behave any differently, B's distribution should not be changed
> > before/after the priority inhertiance of A by C. That's just my opinion - and
> > that's how I calculated to distribution. With that mind, could you go back to
> > seeing if my math was originally correct or did I mess something up?
> It's not about the math. But I think the before and after can't be the same for
> C..

C is acquiring/releasing the lock so I expect its distribution to
change. I was talking about the poor B who has nothing to do with the

> > > I don't think this is valid. If A is blocked on C for 50% of the time, and
> > > sleeping for 50% of the time, when did it get blocked/unblocked?
> > >
> > > This will have an impact on the average share for C and skew it, no?
> > >
> > > Unless I missed something, the average share of C being (3/5 + 1/3) is an
> > > impossible state. You need to consider the portion of time when C runs as 1/5,
> > > when A is actually not blocked on anything, too.
> > >
> > > Hmm actually I just re-read your statement below and you just say 3/5 (18/30)
> > > is too much. You didn't consider the average. I'll leave the above in hope to
> > > help me understand what am I missing and where I went wrong :-)
> > >
> > > Generally IMHO looking at the average will not help. I think if the share
> > > values make sense in each state individually (and I believe they are), that
> > > would be enough. AFAICS, B and D are still taking the right amount of time when
> > > C inherits the bandwidth. And C by definition will run longer when A is blocked
> > > on it for the whole duration of this blocked time.
> >
> > I was degenerating the case where A sleeps (say I/O) vs A blocks, to simplify
> > the math, and then taking average of that. I think that's reasonable?
> I'm not sure. This is skewing the results in my view.
> I think the comparison should just be:
> 1) A, B, C, and D are all running and nothing gets blocked at all. Then shares
> would be:
> 2/5, 1/5, 1/5, 1/5
> 2) A is blocked and C; B, C, D are running with no blocked time. Shares would
> be:
> - , 1/5, 3/5, 1/5
> By definition, we want to treat A in (2) as RUNNING because as soon as
> C unblocks A we should return to (1). From B and D perspective, their share is
> not impacted throughout this transition. Which is AFAIU is what we want to
> achieve.
> I think considering the sleeping time and averaging can lead to misleading
> results if care is not taken.

Yes, but that doesn't mean we can just ignore it. It is easy in my
view to skew the inherited weight to a very large number, only to find
that tasks unrelated to the lock acquire/release are "suffering"
though they had nothing to do with the lock or the PI. But it is
reasonable to try the simple approach first and see the impact.

I also never said the averaging approach or consideration of sleeping
time is perfect ;-)

> Anyway - just trying to explain how I see it and why C is unlikely to be taking
> too much time. I could be wrong. As Youssef said, I think there's no
> fundamental problem here.

I know on Android where they use smaller HZ, the large tick causes
lots of problems for large nice deltas. Example if a highly niced task
was to be preempted for 1ms, and preempts instead at 3ms, then the
less-niced task will not be so nice (even less nice than it promised
to be) any more because of the 2ms boost that the higher niced task
got. This can lead the the sched_latency thrown out of the window. Not
adjusting the weights properly can potentially make that problem much
worse IMO.