Re: [PATCH] sched/pi: Reweight fair_policy() tasks when inheriting prio

From: Qais Yousef
Date: Wed Apr 10 2024 - 02:59:15 EST


On 04/09/24 14:35, Vincent Guittot wrote:
> On Tue, 9 Apr 2024 at 08:19, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 04/08/24 12:51, John Stultz wrote:
> > > On Mon, Apr 8, 2024 at 12:17 AM Vincent Guittot
> > > <vincent.guittot@xxxxxxxxxx> wrote:
> > > >
> > > > On Sun, 7 Apr 2024 at 14:27, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > > > >
> > > > > On 04/05/24 18:16, Qais Yousef wrote:
> > > > >
> > > > > > >
> > > > > > > All that to say that I think the weight is not applied on purpose.
> > > > > > > This might work for your particular case but there are more changes to
> > > > > > > be done if you want to apply prio inheritance between cfs tasks.
> > > > > > >
> > > > > > > As an example, what about the impact of cgroup on the actual weight
> > > > > > > and the inherited priority of a task ? If the owner and the waiter
> > > > > > > don't belong to the same cgroup their own prio is meaningless... task
> > > > > > > nice -20 in a group with a weight equal to nice 19 vs a task nice 19
> > > > > > > in a group with a weight equals to nice -20
> > > > > >
> > > > > > That is on my mind actually. But I thought it's a separate problem. That has to
> > > > > > do with how we calculate the effective priority of the pi_task. And probably
> > > > > > the sorting order to if we agree we need to revert the above. If that is done
> > > > >
> > > > > Thinking more about it the revert is not the right thing to do. We want fair
> > > > > tasks to stay ordered in FIFO for better fairness and avoid potential
> > > > > starvation issues. It's just the logic for searching the top_waiter need to be
> > > > > different. If the top_waiter is fair, then we need to traverse the tree to find
> > > > > the highest nice value. We probably can keep track of this while adding items
> > > > > to the tree to avoid the search.
> > > > >
> > > > > For cgroup; is it reasonable (loosely speaking) to keep track of pi_cfs_rq and
> > > > > detach_attach_task_cfs_rq() before the reweight? This seems the most
> > > > > straightforward solution and will contain the complexity to keeping track of
> > > > > cfs_rq. But it'll have similar issue to proxy execution where a task that
> > > > > doesn't belong to the cgroup will consume its share..
> > > >
> > > > That's a good point, Would proxy execution be the simplest way to fix all this ?
> >
> > Is it? Over 4.5 years ago Unity reported to me about performance inversion
> > problem and that's when proxy execution work was revived as simplest way to fix
> > all of this. But still no end in sight from what I see. I was and still think
> > an interim solution in rt_mutex could help a lot of use cases already without
> > being too complex. Not as elegant and comprehensive like proxy execution, but
> > given the impact on both userspace and out of tree kernel hacks are growing
> > waiting for this to be ready, the cost of waiting is high IMHO.
> >
> > FWIW, I already heard several feedbacks that PTHREAD_PRIO_INHERIT does nothing.
> > I think this reweight issue is more serious problem and likely why I heard this
> > feedback. I could be underestimating the complexity of the fix though. So I'll
>
> Without cgroup, the solution could be straightforward but android uses
> extensively cgroup AFAICT and update_cfs_group() makes impossible to
> track the top cfs waiter and its "prio"

:(

IIUC the issue is that we can't easily come up with a single number of
'effective prio' for N level hierarchy and compare it with another M level
hierarchy..

Does proxy execution fix this problem then? If we can't find the top waiter,
I can't see how proxy execution would work here too. To my understanding it's
more about how we apply inheritance (by donating execution context of the top
waiter) instead of manually applying inheritance like we're doing now.

But the logic of finding the top waiter should remain the same, no?

Need to extend my test case to cover this scenario.