Re: [PATCH] sched/pi: Reweight fair_policy() tasks when inheriting prio

From: Vincent Guittot
Date: Wed Apr 03 2024 - 09:11:26 EST


On Wed, 3 Apr 2024 at 02:59, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> For fair tasks inheriting the priority (nice) without reweighting is
> a NOP as the task's share won't change.

AFAICT, there is no nice priority inheritance with rt_mutex; All nice
tasks are sorted with the same "default prio" in the rb waiter tree.
This means that the rt top waiter is not the cfs with highest prio but
the 1st cfs waiting for the mutex.

>
> This is visible when running with PTHREAD_PRIO_INHERIT where fair tasks
> with low priority values are susceptible to starvation leading to PI
> like impact on lock contention.
>
> The logic in rt_mutex will reset these low priority fair tasks into nice
> 0, but without the additional reweight operation to actually update the
> weights, it doesn't have the desired impact of boosting them to allow
> them to run sooner/longer to release the lock.
>
> Apply the reweight for fair_policy() tasks to achieve the desired boost
> for those low nice values tasks. Note that boost here means resetting
> their nice to 0; as this is what the current logic does for fair tasks.

But you can at the opposite decrease the cfs prio of a task
and even worse with the comment :
/* XXX used to be waiter->prio, not waiter->task->prio */

we use the prio of the top cfs waiter (ie the one waiting for the
lock) not the default 0 so it can be anything in the range [-20:19]

Then, a task with low prio (i.e. nice > 0) can get a prio boost even
if this task and the waiter are low priority tasks

>
> Handling of idle_policy() requires more code refactoring and is not
> handled yet. idle_policy() are treated specially and only run when the
> CPU is idle and get a hardcoded low weight value. Changing weights won't
> be enough without a promotion first to SCHED_OTHER.
>
> Tested with a test program that creates three threads.
>
> 1. main thread that spanws high prio and low prio task and busy
> loops
>
> 2. low priority thread that holds a pthread_mutex() with
> PTHREAD_PRIO_INHERIT protocol. Runs at nice +10. Busy loops
> after holding the lock.
>
> 3. high priority thread that holds a pthread_mutex() with
> PTHREADPTHREAD_PRIO_INHERIT, but made to start after the low
> priority thread. Runs at nice 0. Should remain blocked by the
> low priority thread.
>
> All tasks are pinned to CPU0.
>
> Without the patch I can see the low priority thread running only for
> ~10% of the time which is what expected without it being boosted.
>
> With the patch the low priority thread runs for ~50% which is what
> expected if it gets boosted to nice 0.
>
> I modified the test program logic afterwards to ensure that after
> releasing the lock the low priority thread goes back to running for 10%
> of the time, and it does.
>
> Reported-by: Yabin Cui <yabinc@xxxxxxxxxx>
> Signed-off-by: Qais Yousef <qyousef@xxxxxxxxxxx>
> ---
> kernel/sched/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0621e4ee31de..b90a541810da 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7242,8 +7242,10 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
> } else {
> if (dl_prio(oldprio))
> p->dl.pi_se = &p->dl;
> - if (rt_prio(oldprio))
> + else if (rt_prio(oldprio))
> p->rt.timeout = 0;
> + else if (!task_has_idle_policy(p))
> + reweight_task(p, prio - MAX_RT_PRIO);
> }
>
> __setscheduler_prio(p, prio);
> --
> 2.34.1
>