Re: [PATCH] sched/pi: Reweight fair_policy() tasks when inheriting prio
From: Qais Yousef
Date: Thu Apr 04 2024 - 18:08:28 EST
On 04/03/24 15:11, Vincent Guittot wrote:
> On Wed, 3 Apr 2024 at 02:59, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > For fair tasks inheriting the priority (nice) without reweighting is
> > a NOP as the task's share won't change.
>
> AFAICT, there is no nice priority inheritance with rt_mutex; All nice
Hmm from what I see there is
> tasks are sorted with the same "default prio" in the rb waiter tree.
> This means that the rt top waiter is not the cfs with highest prio but
> the 1st cfs waiting for the mutex.
This is about the order on which tasks contending for the lock more than the
effective priority the task holding the lock should run at though, no?
>
> >
> > This is visible when running with PTHREAD_PRIO_INHERIT where fair tasks
> > with low priority values are susceptible to starvation leading to PI
> > like impact on lock contention.
> >
> > The logic in rt_mutex will reset these low priority fair tasks into nice
> > 0, but without the additional reweight operation to actually update the
> > weights, it doesn't have the desired impact of boosting them to allow
> > them to run sooner/longer to release the lock.
> >
> > Apply the reweight for fair_policy() tasks to achieve the desired boost
> > for those low nice values tasks. Note that boost here means resetting
> > their nice to 0; as this is what the current logic does for fair tasks.
>
> But you can at the opposite decrease the cfs prio of a task
> and even worse with the comment :
> /* XXX used to be waiter->prio, not waiter->task->prio */
>
> we use the prio of the top cfs waiter (ie the one waiting for the
> lock) not the default 0 so it can be anything in the range [-20:19]
>
> Then, a task with low prio (i.e. nice > 0) can get a prio boost even
> if this task and the waiter are low priority tasks
I don't see this effect. The only change I am doing here
is that when we set the prio that we are supposed to be inheriting, instead of
simply changing prio, I also ensure we reweight so that we run at the inherited
nice value. I am not changing how the waiter logic works.
Here's my test app FWIW
https://github.com/qais-yousef/pi_test
When I run
pi_test --lp-nice 0 --lp-nice 10
the lp thread runs at 0 still
If I do
pi_test --lp-nice 10 --lp-nice 5
low priority thread runs at 5
What combination are you worried about? I can give it a try. I use
sched-analyzer-pp [1] to see the division of runnable/running or you can
monitor them on top
#!/bin/bash
set -eux
sudo sched-analyzer &
./pi_test --lp-nice ${1:-10} --hp-nice ${2:-0} --affine-cpu ${3:-0} &
sleep 10
pkill -SIGKILL pi_test
sudo pkill -SIGINT sched-analyzer
sched-analyzer-pp --sched-states pi_test sched-analyzer.perfetto-trace
Picutres of output is attached for before and after
pi_test --lp-nice 10 --hp-nice 0
[1] https://github.com/qais-yousef/sched-analyzer
Attachment:
pi_test_no_reweight.png
Description: PNG image
Attachment:
pi_test_fixed.png
Description: PNG image