Re: [PATCH] sched/pi: Reweight fair_policy() tasks when inheriting prio

From: Qais Yousef
Date: Tue Apr 09 2024 - 02:19:24 EST


On 04/08/24 12:51, John Stultz wrote:
> On Mon, Apr 8, 2024 at 12:17 AM Vincent Guittot
> <vincent.guittot@xxxxxxxxxx> wrote:
> >
> > On Sun, 7 Apr 2024 at 14:27, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > >
> > > On 04/05/24 18:16, Qais Yousef wrote:
> > >
> > > > >
> > > > > All that to say that I think the weight is not applied on purpose.
> > > > > This might work for your particular case but there are more changes to
> > > > > be done if you want to apply prio inheritance between cfs tasks.
> > > > >
> > > > > As an example, what about the impact of cgroup on the actual weight
> > > > > and the inherited priority of a task ? If the owner and the waiter
> > > > > don't belong to the same cgroup their own prio is meaningless... task
> > > > > nice -20 in a group with a weight equal to nice 19 vs a task nice 19
> > > > > in a group with a weight equals to nice -20
> > > >
> > > > That is on my mind actually. But I thought it's a separate problem. That has to
> > > > do with how we calculate the effective priority of the pi_task. And probably
> > > > the sorting order to if we agree we need to revert the above. If that is done
> > >
> > > Thinking more about it the revert is not the right thing to do. We want fair
> > > tasks to stay ordered in FIFO for better fairness and avoid potential
> > > starvation issues. It's just the logic for searching the top_waiter need to be
> > > different. If the top_waiter is fair, then we need to traverse the tree to find
> > > the highest nice value. We probably can keep track of this while adding items
> > > to the tree to avoid the search.
> > >
> > > For cgroup; is it reasonable (loosely speaking) to keep track of pi_cfs_rq and
> > > detach_attach_task_cfs_rq() before the reweight? This seems the most
> > > straightforward solution and will contain the complexity to keeping track of
> > > cfs_rq. But it'll have similar issue to proxy execution where a task that
> > > doesn't belong to the cgroup will consume its share..
> >
> > That's a good point, Would proxy execution be the simplest way to fix all this ?

Is it? Over 4.5 years ago Unity reported to me about performance inversion
problem and that's when proxy execution work was revived as simplest way to fix
all of this. But still no end in sight from what I see. I was and still think
an interim solution in rt_mutex could help a lot of use cases already without
being too complex. Not as elegant and comprehensive like proxy execution, but
given the impact on both userspace and out of tree kernel hacks are growing
waiting for this to be ready, the cost of waiting is high IMHO.

FWIW, I already heard several feedbacks that PTHREAD_PRIO_INHERIT does nothing.
I think this reweight issue is more serious problem and likely why I heard this
feedback. I could be underestimating the complexity of the fix though. So I'll
trust your judgement and ditch further effort if you think it's more effort
than helping proxy execution patchset - for the list at least. I'm likely still
to pursue something out of tree to get into as many android LTS kernels. And
will be happy to share this work if there's desire to try to pick something up
for mainline to make the problem less severe at least.

Note that binder has already performance, latency (out of tree) and priority
inheritance and without those performance is impacted in many corner cases and
considered indispensable part.

>
> So, at the moment, in part. It ought to resolve the issue for
> in-kernel mutexes (blocked tasks stay on rq, if blocked tasks are
> selected to run we will instead run the runnable lock owner - thus it
> works across scheduling classes), but it isn't tied into userland
> futexes the way rt_mutexes are at this point.
>
> Review and feedback on the series would be greatly appreciated!
> (Nudge! Nudge! :)

I am guilty of this, sorry.